Tcga: Difference between revisions

From 太極
Jump to navigation Jump to search
Line 323: Line 323:
# Get all metadata  
# Get all metadata  
metadata_clean <- recount::all_metadata("tcga")
metadata_clean <- recount::all_metadata("tcga")
dim(metadata_clean)
# [1] 11284  864
kable(table(metadata_clean$gdc_cases.project.project_id))
|Var1      | Freq|
|:---------|----:|
|TCGA-ACC  |  79|
|TCGA-BLCA |  433|
|TCGA-BRCA | 1246|
|TCGA-CESC |  309|
|TCGA-CHOL |  45|
|TCGA-COAD |  546|
|TCGA-DLBC |  48|
|TCGA-ESCA |  198|
|TCGA-GBM  |  175|
|TCGA-HNSC |  548|
|TCGA-KICH |  91|
|TCGA-KIRC |  616|
|TCGA-KIRP |  323|
|TCGA-LAML |  126|
|TCGA-LGG  |  532|
|TCGA-LIHC |  424|
|TCGA-LUAD |  601|
|TCGA-LUSC |  555|
|TCGA-MESO |  87|
|TCGA-OV  |  430|
|TCGA-PAAD |  183|
|TCGA-PCPG |  187|
|TCGA-PRAD |  558|
|TCGA-READ |  177|
|TCGA-SARC |  265|
|TCGA-SKCM |  473|
|TCGA-STAD |  453|
|TCGA-TGCT |  156|
|TCGA-THCA |  572|
|TCGA-THYM |  122|
|TCGA-UCEC |  589|
|TCGA-UCS  |  57|
|TCGA-UVM  |  80|


# Get only PAAD project
# Get only PAAD project

Revision as of 16:04, 11 April 2023

Resources

Tumor vs normal

Drug response

  • Evaluating the molecule-based prediction of clinical drug responses in cancer Ding 2016 Bioinformatics, "bioinformatics_32_19_2891_s1.zip". 572 samples.
    library(readxl)
    dat <- read_excel("~/Downloads/bioinformatics_32_19_2891_s1/bioinfo16_supplementary_tables.xlsx", 
                      "Table S2", skip=2)
    dat <- dat[-1, ]
    dim(dat)
    # [1] 2572   14
    kable(table(dat$Cancer))
    |Var1                                                                    | Freq|
    |:-----------------------------------------------------------------------|----:|
    |Adrenocortical carcinoma (ACC)                                          |   13|
    |Bladder Urothelial Carcinoma (BLCA)                                     |  164|
    |Brain Lower Grade Glioma (LGG)                                          |  162|
    |Breast invasive carcinoma (BRCA)                                        |  389|
    |Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) |   97|
    |Colon adenocarcinoma (COAD)                                             |  192|
    |Esophageal carcinoma (ESCA)                                             |   25|
    |Glioblastoma multiforme (GBM)                                           |   10|
    |Head and Neck squamous cell carcinoma (HNSC)                            |  112|
    |Kidney Chromophobe (KICH)                                               |    2|
    |Kidney renal clear cell carcinoma (KIRC)                                |   14|
    |Kidney renal papillary cell carcinoma (KIRP)                            |   14|
    |Liver hepatocellular carcinoma (LIHC)                                   |   29|
    |Lung adenocarcinoma (LUAD)                                              |  151|
    |Lung squamous cell carcinoma (LUSC)                                     |   69|
    |Mesothelioma (MESO)                                                     |   80|
    |Ovarian serous cystadenocarcinoma (OV)                                  |   11|
    |Pancreatic adenocarcinoma (PAAD)                                        |   99|
    |Pheochromocytoma and Paraganglioma (PCPG)                               |    7|
    |Prostate adenocarcinoma (PRAD)                                          |   45|
    |Rectum adenocarcinoma (READ)                                            |   67|
    |Sarcoma (SARC)                                                          |  101|
    |Skin Cutaneous Melanoma (SKCM)                                          |  137|
    |Stomach adenocarcinoma (STAD)                                           |  243|
    |Testicular Germ Cell Tumors (TGCT)                                      |  159|
    |Thyroid carcinoma (THCA)                                                |   10|
    |Uterine Carcinosarcoma (UCS)                                            |   83|
    |Uterine Corpus Endometrial Carcinoma (UCEC)                             |   87|
    
    kable(table(dat$drug_name))
    |Var1                                               | Freq|
    |:--------------------------------------------------|----:|
    |Aldesleukin                                        |    6|
    |Alverine                                           |    1|
    |Anastrozole                                        |   18|
    |anti-A5B1 integrin monoclonal antibody PF-04605412 |    1|
    |anti-endosialin/TEM1 monoclonal antibody MORAb-004 |    1|
    |autologous vaccine                                 |    1|
    |Axitinib                                           |    3|
    |AZD2171                                            |    1|
    |Bacillus Calmette-Guerin (BCG)                     |    1|
    |BCG                                                |    2|
    |Bevacizumab                                        |   48|
    |Bicalutamide                                       |   17|
    |Bleomycin                                          |   54|
    |BRAF inhibitor                                     |    1|
    |Cabazitaxel                                        |    2|
    |Cabozantinib                                       |    1|
    |Cancer Vax                                         |    2|
    |Capecitabine                                       |   55|
    |Carboplatin                                        |  181|
    |Carmustine                                         |    6|
    |Cetuximab                                          |   21|
    |Chemo, Multi-Agent, NOS                            |    1|
    |Chemo, NOS                                         |    3|
    |Cilengtide                                         |    1|
    |Cisplatin                                          |  330|
    |Copolang                                           |    5|
    |COPOLANG CAPS                                      |    1|
    |Cyclophosphamide                                   |  103|
    |cyclophosphamide, vincristine, and dacarbazine     |    1|
    |Cyclosporine                                       |    1|
    |Dabrafenib                                         |    4|
    |Dacarbazine                                        |   30|
    |Dactinomycin                                       |    3|
    |Dasatinib                                          |    8|
    |Degarelix                                          |    1|
    |Denosumab                                          |    2|
    |Dexamethasone                                      |    6|
    |Didox                                              |    3|
    |Docetaxel                                          |  106|
    |Docetaxel +/- Zactima                              |    1|
    |Doxorubicin                                        |  108|
    |doxorubicin/cyclophosphamide                       |    1|
    |E7389                                              |    2|
    |Enoticumab                                         |    1|
    |Epirubicin                                         |   28|
    |Epoetin alfa                                       |    1|
    |Eribulin                                           |    1|
    |Erlotinib                                          |    7|
    |Etoposide                                          |   87|
    |Everolimus                                         |    6|
    |Everolimus, Gemcitabine, and Cisplatin             |    1|
    |Exemestane                                         |    3|
    |EZN-2968                                           |    1|
    |Fluorouracil                                       |  212|
    |Folfiri                                            |    1|
    |Folfox                                             |    2|
    |FOLFOX                                             |    2|
    |Fotemustine                                        |    3|
    |Fulvestrant                                        |    1|
    |Gefitinib                                          |    2|
    |Gemcitabine                                        |  165|
    |Gemox                                              |    1|
    |Goserelin                                          |    8|
    |GP-100                                             |    1|
    |GP100                                              |    1|
    |HSC vaccine injection                              |    1|
    |Hydrocortisone                                     |    1|
    |Hydroxyurea                                        |    1|
    |Ifosfamid                                          |    1|
    |Ifosfamide                                         |   24|
    |Imatinib                                           |    3|
    |Infliximab                                         |    2|
    |Interferon alfa-n1                                 |    6|
    |Interferon alfacon-1                               |    8|
    |iodine I 131 monoclonal antibody 81C6              |    1|
    |Ipilimumab                                         |   11|
    |Irinotecan                                         |   30|
    |Ixabepilone                                        |    1|
    |Ketoconazole                                       |    1|
    |Lapatinib                                          |    2|
    |Letrozole                                          |    5|
    |Leucovorin                                         |   93|
    |Leuprolide                                         |   16|
    |Levothyroxine                                      |    1|
    |Liothyronine                                       |    7|
    |Lomustine                                          |   11|
    |LY228820                                           |    1|
    |Megestrol acetate                                  |    2|
    |MEL-44                                             |    2|
    |Melphalan                                          |    6|
    |Methotrexate                                       |   15|
    |Methylprednisolone                                 |    1|
    |Mitomycin                                          |    7|
    |Mitotane                                           |    1|
    |Mitoxantrone                                       |    1|
    |Mycophenolic acid                                  |    6|
    |Nilutamide                                         |    2|
    |nivolumab                                          |    1|
    |Ondansetron                                        |    1|
    |Oxaliplatin                                        |   75|
    |Paclitaxel                                         |  172|
    |Pamidronate                                        |    3|
    |Panitumumab                                        |    1|
    |Pazopanib                                          |    6|
    |Pegfilgrastim                                      |    6|
    |Pemetrexed                                         |   44|
    |PI-88                                              |    1|
    |Platinum                                           |    5|
    |PNU-159548                                         |    1|
    |Poly E                                             |    1|
    |Polyplatillen                                      |    2|
    |Procarbazine                                       |    8|
    |px-866                                             |    1|
    |R1507                                              |    1|
    |Raloxifene                                         |    1|
    |recMAGE- A3                                        |    1|
    |recombinant interferon-∥2b                         |    1|
    |Regorafenib                                        |    1|
    |RenAmin                                            |    1|
    |Resiquimod                                         |    1|
    |ridaforolimus                                      |    1|
    |rigosertib                                         |    1|
    |Rituximab                                          |    1|
    |Sargramostim                                       |    2|
    |Sorafenib                                          |   17|
    |Streptozocin                                       |    1|
    |Sulindac                                           |    1|
    |Sunitinib                                          |   10|
    |Talimogene Laherparepvec (T-VEC)                   |    1|
    |Tamoxifen                                          |   24|
    |tegafur-gimeracil-oteracil potassium               |    3|
    |Temozolomide                                       |  116|
    |Temsirolimus                                       |    3|
    |Thalidomide                                        |    1|
    |Themozolomide                                      |    2|
    |Threshold-302                                      |    1|
    |Topotecan                                          |    4|
    |Toremifene                                         |    1|
    |Trabectedin                                        |    3|
    |Trametinib                                         |    2|
    |Trastuzumab                                        |   17|
    |Trelstar                                           |    2|
    |triptorelin                                        |    1|
    |Tyrosine kinase inhibitor                          |    1|
    |veliparib                                          |    2|
    |Vemurafenib                                        |    3|
    |Vinblastine                                        |   16|
    |Vincristine                                        |   13|
    |Vinorelbine                                        |   31|
    |Vorinostat                                         |    3|
    |Yervoy                                             |    2|
    |Zoledronate                                        |    2|
    
    kable(table(dat$Cancer[dat$drug_name == "Gemcitabine"]))
    |Var1                                                                    | Freq|
    |:-----------------------------------------------------------------------|----:|
    |Bladder Urothelial Carcinoma (BLCA)                                     |   48|
    |Breast invasive carcinoma (BRCA)                                        |    1|
    |Cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) |    2|
    |Esophageal carcinoma (ESCA)                                             |    2|
    |Liver hepatocellular carcinoma (LIHC)                                   |    3|
    |Lung adenocarcinoma (LUAD)                                              |    7|
    |Lung squamous cell carcinoma (LUSC)                                     |   10|
    |Mesothelioma (MESO)                                                     |    6|
    |Pancreatic adenocarcinoma (PAAD)                                        |   60|
    |Pheochromocytoma and Paraganglioma (PCPG)                               |    1|
    |Sarcoma (SARC)                                                          |   22|
    |Skin Cutaneous Melanoma (SKCM)                                          |    1|
    |Uterine Carcinosarcoma (UCS)                                            |    1|
    |Uterine Corpus Endometrial Carcinoma (UCEC)                             |    1|
    
    kable(table(dat$drug_name[dat$Cancer == "Pancreatic adenocarcinoma (PAAD)"]))
    |Var1             | Freq|
    |:----------------|----:|
    |Capecitabine     |    6|
    |Carboplatin      |    1|
    |Cyclophosphamide |    1|
    |Docetaxel        |    1|
    |Doxorubicin      |    1|
    |Erlotinib        |    1|
    |Fluorouracil     |   13|
    |Gemcitabine      |   60|
    |Irinotecan       |    3|
    |Leucovorin       |    4|
    |Oxaliplatin      |    6|
    |Paclitaxel       |    2|
    
  • The above data was used by Predicting cancer prognosis and drug response from the tumor microbiome Hermida 2022.
  • TCGA immunotherapy treated melanoma data. This uses recount::all_metadata() function.
    # Get all metadata 
    metadata_clean <- recount::all_metadata("tcga")
    dim(metadata_clean)
    # [1] 11284   864
    kable(table(metadata_clean$gdc_cases.project.project_id))
    |Var1      | Freq|
    |:---------|----:|
    |TCGA-ACC  |   79|
    |TCGA-BLCA |  433|
    |TCGA-BRCA | 1246|
    |TCGA-CESC |  309|
    |TCGA-CHOL |   45|
    |TCGA-COAD |  546|
    |TCGA-DLBC |   48|
    |TCGA-ESCA |  198|
    |TCGA-GBM  |  175|
    |TCGA-HNSC |  548|
    |TCGA-KICH |   91|
    |TCGA-KIRC |  616|
    |TCGA-KIRP |  323|
    |TCGA-LAML |  126|
    |TCGA-LGG  |  532|
    |TCGA-LIHC |  424|
    |TCGA-LUAD |  601|
    |TCGA-LUSC |  555|
    |TCGA-MESO |   87|
    |TCGA-OV   |  430|
    |TCGA-PAAD |  183|
    |TCGA-PCPG |  187|
    |TCGA-PRAD |  558|
    |TCGA-READ |  177|
    |TCGA-SARC |  265|
    |TCGA-SKCM |  473|
    |TCGA-STAD |  453|
    |TCGA-TGCT |  156|
    |TCGA-THCA |  572|
    |TCGA-THYM |  122|
    |TCGA-UCEC |  589|
    |TCGA-UCS  |   57|
    |TCGA-UVM  |   80|
    
    # Get only PAAD project
    x <- metadata_clean[metadata_clean$gdc_cases.project.project_id == "TCGA-PAAD",]
    dim(x)
    # [1] 183 864
    class(x)
    # [1] "DFrame"
    x$xml_tumor_response_cdus_type
      [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
     [17] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
     [33] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
     [49] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
     [65] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
     [81] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
     [97] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    [113] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    [129] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    [145] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    [161] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    [177] <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    Levels:  Complete response Partial response Progression Stable
    
    library(knitr)
    kable(table(toupper(x$cgc_drug_therapy_drug_name)))
    |Var1                  | Freq|
    |:---------------------|----:|
    |5 FU                  |    3|
    |5-FLUOROURACIL        |    3|
    |5-FU                  |    4|
    |5FU                   |    1|
    |ABRAXANE              |    2|
    |CAPECITABINE          |    2|
    |CHEMO, NOS            |    2|
    |CISPLATIN             |    2|
    |CYCLOPHOSPHAMIDE      |    1|
    |DOCETAXEL             |    1|
    |FLUOROURACIL          |    2|
    |FOLINIC ACID          |    1|
    |FU7                   |    1|
    |GEMCITABINE           |   68|
    |GEMCITABINE INJECTION |    1|
    |GEMCITIBINE           |    1|
    |GEMZAR                |    7|
    |LEUCOVORIN            |    3|
    |LEUCOVORIN CALCIUM    |    2|
    |OXALIPLATIN           |    4|
    |XELODA                |    2|
    
    x$gdc_cases.submitter_id
    sum(x$gdc_cases.project.project_id == "TCGA-PAAD")
    # [1] 183
    x$cgc_case_primary_therapy_outcome_success
    x$cgc_case_id == "TCGA-F2-6880"
    x$xml_bcr_patient_barcode
    x$xml_vital_status
    x$xml_tumor_type
    x$xml_primary_therapy_outcome_success
    colnames(x)[c(198, 359)] # Open the dumped xlsx file, search "complete"
    # [1] "cgc_case_primary_therapy_outcome_success" "xml_primary_therapy_outcome_success" 
    # cgc column is characters and xml column is a factor
    
    # write.table(as.matrix(x), file = "x.txt", sep="\t") NOT WORKING
    writexl::write_xlsx(as.data.frame(x), "~/Downloads/x.xlsx")
    
    kable(table(x[x$gdc_cases.project.project_id == "TCGA-PAAD", 
                  "xml_primary_therapy_outcome_success"] ))
    |Var1                        | Freq|
    |:---------------------------|----:|
    |                            |    0|
    |Complete Remission/Response |   43|
    |Partial Remission/Response  |    8|
    |Progressive Disease         |   40|
    |Stable Disease              |    8|
    |0                           |    0|
    |1                           |    0|
    |2                           |    0|
    |NO                          |    0|
    |YES                         |    0|
    
    x[x$gdc_cases.project.project_id == "TCGA-PAAD" & !is.na(x$xml_primary_therapy_outcome_success), 
      c("xml_bcr_patient_barcode", "xml_vital_status", "cgc_case_pathologic_stage", 
        "cgc_case_primary_therapy_outcome_success")] |> dim()
    # [1] 99  4
    
    x[x$gdc_cases.project.project_id == "TCGA-PAAD", "cgc_case_primary_therapy_outcome_success"] |> table() |> kable()
    |Var1                        | Freq|
    |:---------------------------|----:|
    |Complete Remission/Response |   44|
    |Partial Remission/Response  |    8|
    |Progressive Disease         |   40|
    |Stable Disease              |    8|
    

Understand TCGA Barcode/Sample ID

https://docs.gdc.cancer.gov/Encyclopedia/pages/TCGA_Barcode/

The TCGA sample label you provided, "TCGA.06.0675.11A.32R.A36H.07", is a standardized label used by The Cancer Genome Atlas (TCGA) project to identify biological samples collected from patients with cancer. The label provides important information about the sample, including the tumor type, the patient ID, and the sample collection site.

Here's a breakdown of the label components:

  • "TCGA" - This is the prefix used for all TCGA samples.
  • "06" - This represents the TCGA disease program, in this case, it refers to the program for Prostate Adenocarcinoma.
  • "0675" - This is the patient ID, a unique identifier assigned to each patient whose samples were included in the TCGA study.
  • "11A" - This represents the type + vial of sample, in this case, it's a primary tumor. Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29. vial = a tube for collecting something.
  • "32R" - This is the portion of the tumor that was collected, in this case, it's the 32nd sample collected from the right lobe of the prostate.
  • "A36H" - This represents the TCGA biospecimen type, in this case, it's a "Solid Tissue Normal" sample from the patient's adrenal gland.
  • "07" - This is the TCGA sample type, in this case, it's "Diagnostic Slide".