Finding Consensus with CellPhoneDBv5

Author

Ho Jin Ker Nigel, Adaikalavan Ramasamy, Gokce Oguz

Published

July 18, 2025

This document is for outlining the mapping process of interactions generated from cell-cell communication (CCC) analysis - using both CellPhoneDBv5 and CellChatv2.

CellChat is used as the reference CCC tool, and we want to find out which interactions in CellChat are supported by CellPhoneDB. To ensure high confidence and unbiased mapping, we convert all gene symbols to their respective Uniprot IDs in both CCC tools, and do the same for the subunits of complexes.

1. Setup

1.1. R setup

pacman::p_load(tidyverse, readxl, writexl, janitor, CellChat, ggvenn, biomaRt, 
               conflicted, tidytext, DT, kableExtra)
setwd(this.path::here())
rm(list = ls())

conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("setdiff", "base")

# removes NA from all kable
options(knitr.kable.NA = "") 

1.2. Folder setup

sapply( c("data", "results"), dir.create, recursive = TRUE )

1.3. Download CellPhoneDB data files

These files were downloaded on 11th July 2025.

basepath <- "https://raw.github.com/ventolab/cellphonedb-data/refs/heads/master"

# Main database containing the interaction
download.file( file.path(basepath, "cellphonedb.zip"), "data/cellphonedb.zip" )
unzip("data/cellphonedb.zip", exdir = "data", junkpaths = TRUE)

# Additional files for mapping
download.file( file.path(basepath, "data/complex_input.csv"), "data/complex_input.csv" )
download.file( file.path(basepath, "data/protein_input.csv"), "data/protein_input.csv" )
download.file( file.path(basepath, "data/sources/uniprot_synonyms.tsv"), "data/uniprot_synonyms.tsv" )

2. Complex Mapping

2.1. Mapping CellPhoneDB complexes to Uniprot IDs

CPDB.complex2protein <- read_csv("data/complex_input.csv") %>% 
  select(complex_name, starts_with("uniprot"))

example <- c("12oxoLeukotrieneB4_byPTGR1", "TREM2_receptor", "RAreceptor_RARA_RXRA",
             "Pregnenolone_byCYP11A1", "5HTR3_complex")
CPDB.complex2protein %>% 
  filter(complex_name %in% example) %>% 
  mutate(complex_name = factor(complex_name, levels = example)) %>% 
  arrange(complex_name) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5
12oxoLeukotrieneB4_byPTGR1 Q14914
TREM2_receptor Q9NZC2 O43914
RAreceptor_RARA_RXRA P10276 P19793 P29373
Pregnenolone_byCYP11A1 P05108 Q6P4F2 P10109 P22570
5HTR3_complex P46098 O95264 Q8WXA8 Q70Z44 A5X5Y0
CPDB.complex2protein %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  tabyl(nproteins_per_complex) %>% 
  knitr::kable(align = rep("l", nrow(.)))
nproteins_per_complex n percent
1 61 0.1685083
2 261 0.7209945
3 36 0.0994475
4 3 0.0082873
5 1 0.0027624


There are 362 complexes and each complex can include up to 5 proteins (but typically 2).

2.2. Add a unique key for matching

To create the complex key, we concatenated the subunits of heteromeric complexes into a vector, removed NAs within, and sorted the subunits before collapsing them into a single character value. We found that sorting eases the mapping process computationally between CellPhoneDB and CellChat (Before sort: < 250 mapped; After sort: > 290 mapped).

CPDB.complex2protein <- CPDB.complex2protein %>% 
  rowwise() %>% 
  mutate(complex_key = paste(
    sort(na.omit(c(uniprot_1, uniprot_2, uniprot_3, uniprot_4, uniprot_5))), 
    collapse = "_")) %>% 
  ungroup()
CPDB.complex2protein %>% 
  filter(complex_name %in% example) %>% 
  mutate(complex_name = factor(complex_name, levels = example)) %>% 
  arrange(complex_name) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 complex_key
12oxoLeukotrieneB4_byPTGR1 Q14914 Q14914
TREM2_receptor Q9NZC2 O43914 O43914_Q9NZC2
RAreceptor_RARA_RXRA P10276 P19793 P29373 P10276_P19793_P29373
Pregnenolone_byCYP11A1 P05108 Q6P4F2 P10109 P22570 P05108_P10109_P22570_Q6P4F2
5HTR3_complex P46098 O95264 Q8WXA8 Q70Z44 A5X5Y0 A5X5Y0_O95264_P46098_Q70Z44_Q8WXA8

2.3. Mapping CellChatDB complexes to Uniprot IDs

CellChat version 2.2.0 stores the database within R under CellChatDB.human.

pacman::p_version("CellChat")
[1] '2.2.0'
CCDB.complex2gene <- CellChatDB.human$complex %>% 
  mutate(complex_name = rownames(.)) %>% 
  rename_with( ~ gsub("subunit", "symbol", .)) %>% 
  relocate(complex_name) %>% 
  remove_rownames()

example <- c("12oxoLTB4-PTGR1", "TREM2_TYROBP", "RARA_RXRA_CRABP2", 
             "Pregnenolone-CYP11A1", "HTR3_complex")

CCDB.complex2gene %>% dim()
[1] 338   6
CCDB.complex2gene %>% 
  filter(complex_name %in% example) %>% 
  knitr::kable()
complex_name symbol_1 symbol_2 symbol_3 symbol_4 symbol_5
12oxoLTB4-PTGR1 PTGR1
TREM2_TYROBP TREM2 TYROBP
RARA_RXRA_CRABP2 RARA RXRA CRABP2
Pregnenolone-CYP11A1 CYP11A1 FDX2 FDX1 FDXR
HTR3_complex HTR3A HTR3B HTR3C HTR3D HTR3E


We need to map the gene symbols to uniprot IDs to make this database compatible with CellPhoneDB.

CCDB.protein2gene <- CellChatDB.human$geneInfo %>% 
  select(EntryID.uniprot, Symbol)

CCDB.protein2gene %>% dim()
[1] 26827     2
CCDB.protein2gene %>% head()
# A tibble: 6 × 2
  EntryID.uniprot Symbol 
  <chr>           <chr>  
1 P04217          A1BG   
2 Q9NQ94          A1CF   
3 P01023          A2M    
4 A8K2U0          A2ML1  
5 U3KPV4          A3GALT2
6 Q9NPC4          A4GALT 
## Map gene symbol to uniprot
CCDB.complex2protein <- CCDB.complex2gene %>% 
  pivot_longer(cols = !complex_name) %>% 
  left_join(CCDB.protein2gene, by = c("value" = "Symbol")) %>% 
  select(complex_name, name, EntryID.uniprot) %>% 
  pivot_wider(id_cols = complex_name, names_from = "name", values_from = "EntryID.uniprot") %>% 
  rename_with( ~ gsub("symbol", "uniprot", .))

CCDB.complex2protein %>% 
  filter(complex_name %in% example) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5
12oxoLTB4-PTGR1 Q14914
TREM2_TYROBP Q9NZC2 O43914
RARA_RXRA_CRABP2 P10276 P19793 P29373
Pregnenolone-CYP11A1 P05108 Q6P4F2 P10109 P22570
HTR3_complex P46098 O95264 Q8WXA8 Q70Z44 A5X5Y0


Again, we can count the number of proteins per complex:

CCDB.complex2protein %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  tabyl(nproteins_per_complex) %>% 
  knitr::kable(align = rep("l", nrow(.)))
nproteins_per_complex n percent
1 50 0.1479290
2 250 0.7396450
3 33 0.0976331
4 4 0.0118343
5 1 0.0029586


Now, let’s add a key to help with the comparison, using the same logic from CellPhoneDB.

CCDB.complex2protein <- CCDB.complex2protein %>% 
  rowwise() %>% 
  mutate(complex_key = paste(
    sort(na.omit(c(uniprot_1, uniprot_2, uniprot_3, uniprot_4, uniprot_5))), 
    collapse = "_")) %>% 
  ungroup()
CCDB.complex2protein %>% 
  filter(complex_name %in% example) %>%
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 complex_key
12oxoLTB4-PTGR1 Q14914 Q14914
TREM2_TYROBP Q9NZC2 O43914 O43914_Q9NZC2
RARA_RXRA_CRABP2 P10276 P19793 P29373 P10276_P19793_P29373
Pregnenolone-CYP11A1 P05108 Q6P4F2 P10109 P22570 P05108_P10109_P22570_Q6P4F2
HTR3_complex P46098 O95264 Q8WXA8 Q70Z44 A5X5Y0 A5X5Y0_O95264_P46098_Q70Z44_Q8WXA8

2.4. Duplicate keys

Before we merge, let us pull out the duplicate keys within each database for manual curation.

CPDB.complex2protein.uniq <- CPDB.complex2protein %>% 
  add_count(complex_key) %>% 
  filter(n == 1) %>% 
  select(-n) 

CPDB.complex2protein.dup <- CPDB.complex2protein %>% 
  add_count(complex_key) %>% 
  filter(n > 1) %>% 
  select(-n) %>% 
  arrange(complex_key)

Duplicate complex keys in CellPhoneDB

CPDB.complex2protein.dup %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  arrange(nproteins_per_complex) %>% 
  select(-nproteins_per_complex) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 complex_key
SerotoninDopamin_byDDC P20711 P20711
b-PEA_byDDC P20711 P20711
Desmosterol_byDHCR7 Q9UBM7 Q9UBM7
Cholesterol_byDHCR7 Q9UBM7 Q9UBM7
IL28_receptor Q8IU57 Q08334 Q08334_Q8IU57
Type_III_IFNR Q08334 Q8IU57 Q08334_Q8IU57
5SHpETE_byALOX5 P09917 Q16873 P20292 P09917_P20292_Q16873
LeukotrieneC4_byLTC4S P20292 P09917 Q16873 P09917_P20292_Q16873
LipoxinA4_byALOX5 P09917 P20292 Q16873 P09917_P20292_Q16873
CCDB.complex2protein.uniq <- CCDB.complex2protein %>% 
  add_count(complex_key) %>% 
  filter(n == 1) %>% 
  select(-n)

CCDB.complex2protein.dup <- CCDB.complex2protein %>% 
  add_count(complex_key) %>% 
  filter(n > 1) %>% 
  select(-n) %>% 
  arrange(complex_key)

Duplicate complex keys in CellChat

CCDB.complex2protein.dup %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  arrange(nproteins_per_complex) %>% 
  select(-nproteins_per_complex) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 complex_key
Cholesterol-DHCR7 Q9UBM7 Q9UBM7
Desmosterol-DHCR7 Q9UBM7 Q9UBM7
CALCRL_RAMP1 P30988 O60894 O60894_P30988
CALCR_RAMP1 P30988 O60894 O60894_P30988
CALCRL_RAMP2 P30988 O60895 O60895_P30988
CALCR_RAMP2 P30988 O60895 O60895_P30988
CALCRL_RAMP3 P30988 O60896 O60896_P30988
CALCR_RAMP3 P30988 O60896 O60896_P30988
ITGA1_ITGB1 P56199 P05556 P05556_P56199
ITGB1_ITGA1 P05556 P56199 P05556_P56199
KLRD1_KLRC1 Q13241 P26715 P26715_Q13241
CD94:NKG2A Q13241 P26715 P26715_Q13241
KLRD1_KLRC2 Q13241 P26717 P26717_Q13241
CD94:NKG2C Q13241 P26717 P26717_Q13241
KLRK1_HCST P26718 Q9UBK5 P26718_Q9UBK5
NKG2D_HCST P26718 Q9UBK5 P26718_Q9UBK5
LTC4-LTC4S P20292 P09917 Q16873 P09917_P20292_Q16873
LXA4-ALOX5 P09917 P20292 Q16873 P09917_P20292_Q16873


Let’s merge the unique records

combined_unique <- full_join(CCDB.complex2protein.uniq, CPDB.complex2protein.uniq,
                             by = "complex_key", 
                             suffix = c(".CC", ".CP")) %>% 
                   relocate("complex_key")
combined_unique


Save the output as excel file for manual mapping

openxlsx::write.xlsx( 
  list( partial  = combined_unique,
        CCDB_dup = CCDB.complex2protein.dup,
        CPDB_dup = CPDB.complex2protein.dup), 
  file = "matched_partial_tmp.xlsx")

After manually mapping complexes to both databases and assigning them to a unique complex key, the mapped file contains six sheets listed as such:

excel_sheets("matched_partial.xlsx")
[1] "partial"       "CCDB_dup"      "CPDB_dup"      "partial match"
[5] "mapped"        "clean"        

How duplicate keys were handled

CellPhoneDB

CPDB.dup <- read_excel("matched_partial.xlsx", sheet = "CPDB_dup") 

CPDB.dup %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  arrange(nproteins_per_complex) %>% 
  select(-nproteins_per_complex) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 complex_key proposed match found
b-PEA_byDDC P20711 P20711 no
SerotoninDopamin_byDDC P20711 P20711 yes
Desmosterol_byDHCR7 Q9UBM7 Q9UBM7 yes
Cholesterol_byDHCR7 Q9UBM7 Q9UBM7 yes
IL28_receptor Q8IU57 Q08334 Q08334_Q8IU57 yes
Type_III_IFNR Q08334 Q8IU57 Q08334_Q8IU57 Q8IU57_Q08334 yes
5SHpETE_byALOX5 P09917 Q16873 P20292 P09917_P20292_Q16873 P09917_Q16873_P20292 no
LeukotrieneC4_byLTC4S P20292 P09917 Q16873 P09917_P20292_Q16873 P20292_P09917_Q16873 yes
LipoxinA4_byALOX5 P09917 P20292 Q16873 P09917_P20292_Q16873 P09917_P20292_Q16873 yes

CellChat

CCDB.dup <- read_excel("matched_partial.xlsx", sheet = "CCDB_dup")

CCDB.dup %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  arrange(nproteins_per_complex) %>% 
  select(-nproteins_per_complex) %>% 
  knitr::kable()
complex_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 complex_key proposed match found
Cholesterol-DHCR7 Q9UBM7 Q9UBM7 yes
Desmosterol-DHCR7 Q9UBM7 Q9UBM7 yes
KLRD1_KLRC1 Q13241 P26715 P26715_Q13241 Unused in CellChat
KLRD1_KLRC2 Q13241 P26717 P26717_Q13241 Unused in CellChat
KLRK1_HCST P26718 Q9UBK5 P26718_Q9UBK5 Unused in CellChat
CALCRL_RAMP1 P30988 O60894 O60894_P30988 O60894_Q16602 yes
CALCR_RAMP1 P30988 O60894 O60894_P30988 yes
CALCRL_RAMP2 P30988 O60895 O60895_P30988 O60895_Q16602 yes
CALCR_RAMP2 P30988 O60895 O60895_P30988 yes
CALCRL_RAMP3 P30988 O60896 O60896_P30988 O60896_Q16602 yes
CALCR_RAMP3 P30988 O60896 O60896_P30988 yes
ITGA1_ITGB1 P56199 P05556 P05556_P56199 yes
ITGB1_ITGA1 P05556 P56199 P05556_P56199 yes
CD94:NKG2A Q13241 P26715 P26715_Q13241 yes
CD94:NKG2C Q13241 P26717 P26717_Q13241 yes
NKG2D_HCST P26718 Q9UBK5 P26718_Q9UBK5 yes
LTC4-LTC4S P20292 P09917 Q16873 P09917_P20292_Q16873 P20292_P09917_Q16873 yes
LXA4-ALOX5 P09917 P20292 Q16873 P09917_P20292_Q16873 P09917_P20292_Q16873 yes


We include a proposed complex key to differentiate complexes composed of the same subunits based on:
(1) functional context the complex operates (e.g. same complex producing different non-protein compounds)
(2) false annotation of uniprot ID to its corresponding gene (e.g. P30988 (CALCR) replaced to Q16602 (CALCRL) for CALCRL-containing complexes).

Load the complete curated file

complex <- read_excel("matched_partial.xlsx", sheet = "clean")

complex %>%  dim()
[1] 394  10

Mapped complexes file with the above example complexes for illustration

Each complex in CellChat and CellPhoneDB is now assigned a unique complex key

complex %>% 
  filter(CellChat_name %in% example) %>% 
  mutate(CellChat_name = factor(CellChat_name, levels = example)) %>% 
  arrange(CellChat_name) %>% 
  knitr::kable()
complex_key CellChat_name CellPhoneDB_name uniprot_1 uniprot_2 uniprot_3 uniprot_4 uniprot_5 match Note
Q14914 12oxoLTB4-PTGR1 12oxoLeukotrieneB4_byPTGR1 Q14914 exact
O43914_Q9NZC2 TREM2_TYROBP TREM2_receptor Q9NZC2 O43914 exact
P10276_P19793_P29373 RARA_RXRA_CRABP2 RAreceptor_RARA_RXRA P10276 P19793 P29373 exact
P05108_P10109_P22570_Q6P4F2 Pregnenolone-CYP11A1 Pregnenolone_byCYP11A1 P05108 Q6P4F2 P10109 P22570 exact
A5X5Y0_O95264_P46098_Q70Z44_Q8WXA8 HTR3_complex 5HTR3_complex P46098 O95264 Q8WXA8 Q70Z44 A5X5Y0 exact

Breakdown of complexes in the final list

complex %>% 
  tabyl(match) %>% 
  arrange(desc(n))
                     match   n    percent
                     exact 296 0.75126904
 Found only in CellPhoneDB  55 0.13959391
    Found only in CellChat  31 0.07868020
                    manual   8 0.02030457
             partial match   4 0.01015228


Following manual curation of the complexes and mapping based on uniprot IDs, the break down of the matching process is as shown above. We display the manually-mapped complexes, and partial match complexes which we define as sharing at least two subunits.

complex %>% 
  filter(match %in% c("manual", "partial match")) %>% 
  mutate(nproteins_per_complex = rowSums(!is.na(across(starts_with("uniprot"))))) %>% 
  arrange(nproteins_per_complex) %>% 
  select(-nproteins_per_complex)

Click on the rows to view information in the Note column.

Full complex mapping table

complex

3. Gene Mapping

We map genes in both databases to their respective uniprot IDs for consistency.

3.1. Mapping CellPhoneDB genes to Uniprot IDs

Preprocessing data

rm(list = ls())
options(knitr.kable.NA = NA) # allow NA in kable from this section

CPDB.name2uniprot <- read_delim("data/uniprot_synonyms.tsv") %>% 
  clean_names() %>% 
  select(entry_name, entry)

CPDB.name2uniprot %>% dim()
[1] 207304      2
CPDB.name2uniprot %>% 
  head()
# A tibble: 6 × 2
  entry_name       entry     
  <chr>            <chr>     
1 A0A024QZ08_HUMAN A0A024QZ08
2 A0A024QZ26_HUMAN A0A024QZ26
3 A0A024QZ86_HUMAN A0A024QZ86
4 A0A024QZA8_HUMAN A0A024QZA8
5 A0A024QZB8_HUMAN A0A024QZB8
6 A0A024QZQ1_HUMAN A0A024QZQ1
CPDB.gene2name <- read_csv("data/gene_table.csv") %>% 
  arrange(desc(ensembl)) %>% 
  distinct(protein_id, .keep_all = T) 

CPDB.gene2name %>% dim()
[1] 1355    6
CPDB.gene2name %>% 
  head() %>% 
  knitr::kable(align = rep("l", nrow(.)))
id_gene ensembl gene_name hgnc_symbol protein_name protein_id
472 ENSG00000286070 GGT1 GGT1 GGT1_HUMAN 396
989 ENSG00000285456 LTB4R LTB4R LT4R1_HUMAN 782
1240 ENSG00000285203 LTB4R2 LTB4R2 LT4R2_HUMAN 984
1071 ENSG00000285070 ADIPOR2 ADIPOR2 ADR2_HUMAN 843
500 ENSG00000284816 EPHA1 EPHA1 EPHA1_HUMAN 419
715 ENSG00000284510 KIR2DL3 KIR2DL3 KI2L3_HUMAN 592


We want to map protein_name in CPDB.gene2name to entry (uniprot ID) using CPDB.name2uniprot.

Addressing invalid data points (i.e. non-uniprot protein name, missing entries)

Some proteins did not have a valid uniprot name (no “_HUMAN” in the protein name). Therefore, we manually added the uniprot name with reference from https://www.uniprot.org/.

CPDB.gene2name$protein_name[!grepl("HUMAN$", CPDB.gene2name$protein_name)]
[1] "CCL3L1" "HLAC"   "HLAA"   "HLAB"   "IFNA1" 
# fill in missing entry names
CPDB.gene2name <- CPDB.gene2name %>% 
  mutate(protein_name = case_when(
    protein_name %in% c("HLAA", "HLAB", "HLAC") ~ paste0(protein_name, "_HUMAN"),
    protein_name == "CCL3L1" ~ "CL3L1_HUMAN",
    protein_name == "IFNA1" ~ "L0N195_HUMAN",
    .default = protein_name
  )) %>% 

# mapping from uniprot protein name to the corresponding uniprot IDs
  left_join(CPDB.name2uniprot %>% 
    select(entry, entry_name), by = c("protein_name" = "entry_name"))

CPDB.gene2name %>% dim()
[1] 1355    7
CPDB.gene2name %>% 
  head() %>% 
  knitr::kable(align = rep("l", nrow(.)))
id_gene ensembl gene_name hgnc_symbol protein_name protein_id entry
472 ENSG00000286070 GGT1 GGT1 GGT1_HUMAN 396 P19440
989 ENSG00000285456 LTB4R LTB4R LT4R1_HUMAN 782 Q15722
1240 ENSG00000285203 LTB4R2 LTB4R2 LT4R2_HUMAN 984 Q9NPC1
1071 ENSG00000285070 ADIPOR2 ADIPOR2 ADR2_HUMAN 843 NA
500 ENSG00000284816 EPHA1 EPHA1 EPHA1_HUMAN 419 P21709
715 ENSG00000284510 KIR2DL3 KIR2DL3 KI2L3_HUMAN 592 P43628
rm(CPDB.name2uniprot)

Using biomaRt database to map from Ensembl ID to uniprot ID

For genes without a successful map, we used the biomaRt database to map the genes to their corresponding uniprot IDs via an Ensembl-to-Uniprot mapping.

na.genes <- CPDB.gene2name$ensembl[is.na(CPDB.gene2name$entry)]
na.genes
 [1] "ENSG00000285070" "ENSG00000283448" "ENSG00000203722" "ENSG00000177707"
 [5] "ENSG00000164520" "ENSG00000159346" "ENSG00000155918" "ENSG00000143217"
 [9] "ENSG00000131019" "ENSG00000131015" "ENSG00000130202" "ENSG00000123146"
[13] "ENSG00000111981" "ENSG00000110400" "ENSG00000101000"
# Load biomaRt database
mart <- useMart('ENSEMBL_MART_ENSEMBL')
mart <- useDataset('hsapiens_gene_ensembl', mart)

# listAttributes(mart) %>% view()

annotLookup <- getBM(
  mart = mart,
  attributes = c(
  'hgnc_symbol',
  'ensembl_gene_id',
  'uniprotswissprot'),
  uniqueRows = TRUE) %>% 
  arrange(desc(uniprotswissprot)) %>% 
  distinct(ensembl_gene_id, .keep_all = T) %>% 
  filter(uniprotswissprot != "")

annotLookup %>% dim()
[1] 21801     3
annotLookup %>% head()
  hgnc_symbol ensembl_gene_id uniprotswissprot
1      SYT15B ENSG00000288175           X6R8R1
2      SYT15B ENSG00000277758           X6R8R1
3      CIMIP3 ENSG00000287363           X6R8D5
4       PYDC5 ENSG00000289721           W6CW81
5      SPACA6 ENSG00000182310           W5XKT8
6     A3GALT2 ENSG00000184389           U3KPV4
CPDB.gene2name <- CPDB.gene2name %>% 
  left_join(annotLookup %>% 
    select(-hgnc_symbol), by = c("ensembl" = "ensembl_gene_id")) %>% 
  mutate(entry = ifelse(ensembl %in% na.genes, uniprotswissprot, entry),
         agree = ifelse(entry == uniprotswissprot, T, F))

CPDB.gene2name %>% dim()
[1] 1355    9
CPDB.gene2name %>% 
  head() %>% 
  knitr::kable(align = rep("l", nrow(.)))
id_gene ensembl gene_name hgnc_symbol protein_name protein_id entry uniprotswissprot agree
472 ENSG00000286070 GGT1 GGT1 GGT1_HUMAN 396 P19440 NA NA
989 ENSG00000285456 LTB4R LTB4R LT4R1_HUMAN 782 Q15722 Q15722 TRUE
1240 ENSG00000285203 LTB4R2 LTB4R2 LT4R2_HUMAN 984 Q9NPC1 Q9NPC1 TRUE
1071 ENSG00000285070 ADIPOR2 ADIPOR2 ADR2_HUMAN 843 Q86V24 Q86V24 TRUE
500 ENSG00000284816 EPHA1 EPHA1 EPHA1_HUMAN 419 P21709 P21709 TRUE
715 ENSG00000284510 KIR2DL3 KIR2DL3 KI2L3_HUMAN 592 P43628 P43628 TRUE
rm(na.genes, mart, annotLookup)

Manual check for genes that do not match

CPDB.gene2name %>% 
  tabyl(agree) %>% 
  knitr::kable()
agree n percent valid_percent
FALSE 7 0.0051661 0.0052045
TRUE 1338 0.9874539 0.9947955
NA 10 0.0073801 NA
false.genes <- na.omit(CPDB.gene2name$ensembl[CPDB.gene2name$agree == F])

CPDB.gene2name %>% 
  filter(ensembl %in% false.genes) %>% 
  knitr::kable(align = rep("l", nrow(.)))
id_gene ensembl gene_name hgnc_symbol protein_name protein_id entry uniprotswissprot agree
29 ENSG00000197919 IFNA1 IFNA1 L0N195_HUMAN 182 L0N195 P01562 FALSE
1579 ENSG00000179915 NRXN1 NRXN1 NRX1B_HUMAN 1275 P58400 Q9ULB1 FALSE
1742 ENSG00000179915 NRXN1 NRXN1 NRX1A_HUMAN 1274 Q9ULB1 Q9ULB1 TRUE
1759 ENSG00000171867 PRNP PRNP APRIO_HUMAN 1173 F7VJQ1 P04156 FALSE
185 ENSG00000110680 CALCA CALCA CALC_HUMAN 171 P01258 P06881 FALSE
1580 ENSG00000110076 NRXN2 NRXN2 NRX2B_HUMAN 1276 P58401 Q9P2S2 FALSE
37 ENSG00000101307 SIRPB1 SIRPB1 SIRB1_HUMAN 45 O00241 Q5TFQ8 FALSE
1705 ENSG00000021645 NRXN3 NRXN3 NRX3B_HUMAN 1277 Q9HDB5 Q9Y4C0 FALSE
# Manual check
# ENSG00000197919 use uniprotswissprot
# ENSG00000179915, ENSG00000171867, ENSG00000110680, ENSG00000110076, ENSG00000101307, ENSG00000021645 unchanged

CPDB.gene2uniprot <- CPDB.gene2name %>% 
  mutate(entry = ifelse(ensembl == "ENSG00000197919", uniprotswissprot, entry)) %>% 
  rename(uniprot_id = entry) %>% 
  select(-uniprotswissprot, - agree)

write_csv(CPDB.gene2uniprot, file = "results/gene_cellphonedb.csv")

CPDB.gene2uniprot %>% dim()
[1] 1355    7

Full CellPhoneDB gene mapping table

CPDB.gene2uniprot
rm(false.genes, CPDB.gene2name, CPDB.gene2uniprot)

3.2. Mapping CellChat genes to Uniprot IDs

CCDB.gene2uniprot <- CellChatDB.human$geneInfo %>% 
  clean_names()

na.genes <- CCDB.gene2uniprot$symbol[(is.na(CCDB.gene2uniprot$entry_id_uniprot))]

na.genes
[1] "CCL3L1" "CCL3L3" "GPR1"  
CCDB.gene2uniprot <- CCDB.gene2uniprot %>% 
  mutate(entry_id_uniprot = case_when(
    symbol %in% "CCL3L1" ~ "P16619",
    symbol == "CCL3L3" ~ "P16619_L3", # CCL3L1 & CCL3L3 mapped to same gene; differentiate with "_L3" for CCL3L3!
    symbol == "GPR1" ~ "P46091",
    .default = entry_id_uniprot
))

CCDB.gene2uniprot %>% dim()
[1] 26827     9

Full CellChat gene mapping table

CCDB.gene2uniprot

Click on the rows if you wish to extract full information on protein_name, keywords and location.

write_csv(CCDB.gene2uniprot, file = "results/gene_cellchat.csv")

rm(na.genes, CCDB.gene2uniprot)

4. Interaction Database Mapping

4.1. Mapping CellPhoneDB interactions

CellPhoneDB interaction database is set up such that the interacting partners (multidata_1_id and multidata_2_id) are in the form of its database-specific IDs. Here, we trace them back to the uniprot form we assigned.

CPDB <- read_csv("data/interaction_table.csv")

CPDB %>% dim()
[1] 2911   10
CPDB %>% 
  head() %>% 
  knitr::kable(align = rep("l", nrow(.))) %>% 
  column_spec(c(3, 4), color = "red")
id_interaction id_cp_interaction multidata_1_id multidata_2_id source annotation_strategy is_ppi curator directionality classification
0 CPI-SC0A2DB962D 329 1507 PMID:12392763 curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Cadherin
1 CPI-SC0B5CEA47D 716 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin
2 CPI-SC0C8B7BCBB 322 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin
3 CPI-SC0D3C12C3F 343 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin
4 CPI-SC0B86B7CED 930 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin
5 CPI-SC0FA343CEF 810 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin

Mapping CellPhoneDB’s interaction database (gene)

CPDB.gene <- read_csv("results/gene_cellphonedb.csv")

Recall CellPhoneDB gene mapping table here

CPDB <- CPDB %>% 
  # adding uniprot and gene names to interaction database
  left_join(CPDB.gene %>% 
    select(protein_id, uniprot_id, gene_name), by = c("multidata_1_id" = "protein_id")) %>% 
  mutate(multidata_1_id = ifelse(!is.na(uniprot_id), uniprot_id, multidata_1_id),
         partner_1 = ifelse(!is.na(gene_name), gene_name, NA)) %>% 
  select(-uniprot_id, -gene_name) %>% 
  
  left_join(CPDB.gene %>% 
    select(protein_id, uniprot_id, gene_name), by = c("multidata_2_id" = "protein_id")) %>% 
  mutate(multidata_2_id = ifelse(!is.na(uniprot_id), uniprot_id, multidata_2_id),
         partner_2 = ifelse(!is.na(gene_name), gene_name, NA)) %>% 
  select(-uniprot_id, -gene_name)

CPDB %>% dim()
[1] 2911   12
CPDB %>% 
  head() %>% 
  knitr::kable(align = rep("l", nrow(.))) %>% 
  column_spec(3, color = "green") %>% 
  column_spec(4, color = "red")
id_interaction id_cp_interaction multidata_1_id multidata_2_id source annotation_strategy is_ppi curator directionality classification partner_1 partner_2
0 CPI-SC0A2DB962D P12830 1507 PMID:12392763 curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Cadherin CDH1 NA
1 CPI-SC0B5CEA47D Q03692 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL10A1 NA
2 CPI-SC0C8B7BCBB P12107 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL11A1 NA
3 CPI-SC0D3C12C3F P13942 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL11A2 NA
4 CPI-SC0B86B7CED Q99715 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL12A1 NA
5 CPI-SC0FA343CEF Q5TAT6 1507 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL13A1 NA

Mapping CellPhoneDB’s interaction database (complex)

Load necessary data

multidata <- read_csv("data/multidata_table.csv") %>% 
  select(id_multidata, name)

multidata %>% dim()
[1] 1717    2
multidata %>% tail()
# A tibble: 6 × 2
  id_multidata name               
         <dbl> <chr>              
1         1711 TENM4_FLRT1_complex
2         1712 TENM4_FLRT3_complex
3         1713 hCGB1_complex      
4         1714 hCGB2_complex      
5         1715 hCGB3_complex      
6         1716 hCGB7_complex      
complex.map <- read_excel("matched_partial.xlsx", sheet = "clean") %>% 
  select(complex_key, CellChat_name, CellPhoneDB_name) %>% 
  left_join(multidata, by = c("CellPhoneDB_name" = "name")) %>% 
  mutate(id_multidata = as.character(id_multidata))

complex.map %>% dim()
[1] 394   4
complex.map %>% 
  head() %>% 
  knitr::kable()
complex_key CellChat_name CellPhoneDB_name id_multidata
A5X5Y0_O95264_P46098_Q70Z44_Q8WXA8 HTR3_complex 5HTR3_complex 1621
O00144_O75197 FZD9_LRP5 FZD9_LRP5 1458
O00144_O75581 FZD9_LRP6 FZD9_LRP6 1459
O00238_P27037 BMPR1B_ACVR2A BMR1B_AVR2A 1409
O00238_Q13705 BMPR1B_ACVR2B BMR1B_AVR2B 1410
O00238_Q13873 BMPR1B_BMPR2 BMPR1B_BMPR2 1405
rm(multidata)
CPDB <- CPDB %>% 
  # adding complex uniprot id and name of complexes
  left_join(complex.map %>% 
    select(complex_key, id_multidata, CellPhoneDB_name), by = c("multidata_1_id" = "id_multidata")) %>% 
  mutate(multidata_1_id = ifelse(!is.na(complex_key), complex_key, multidata_1_id),
         partner_1 = ifelse(!is.na(CellPhoneDB_name), CellPhoneDB_name, partner_1)) %>% 
  select(-complex_key, -CellPhoneDB_name) %>% 
  
  left_join(complex.map %>% 
    select(complex_key, id_multidata, CellPhoneDB_name), by = c("multidata_2_id" = "id_multidata")) %>% 
  mutate(multidata_2_id = ifelse(!is.na(complex_key), complex_key, multidata_2_id),
         partner_2 = ifelse(!is.na(CellPhoneDB_name), CellPhoneDB_name, partner_2)) %>% 
  select(-complex_key, -CellPhoneDB_name) %>% 
  distinct()

CPDB <- CPDB %>% 
  mutate(unique_int = paste(multidata_1_id, multidata_2_id, sep = "_"))

CPDB %>% dim()
[1] 2911   13

Now, all interacting partners in CellPhoneDB are in the form of uniprot IDs/complex keys:

CPDB %>% 
  head() %>% 
  knitr::kable(align = rep("l", nrow(.))) %>% 
  column_spec(c(3, 4), color = "green")
id_interaction id_cp_interaction multidata_1_id multidata_2_id source annotation_strategy is_ppi curator directionality classification partner_1 partner_2 unique_int
0 CPI-SC0A2DB962D P12830 P05556_P17301 PMID:12392763 curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Cadherin CDH1 integrin_a2b1_complex P12830_P05556_P17301
1 CPI-SC0B5CEA47D Q03692 P05556_P17301 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL10A1 integrin_a2b1_complex Q03692_P05556_P17301
2 CPI-SC0C8B7BCBB P12107 P05556_P17301 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL11A1 integrin_a2b1_complex P12107_P05556_P17301
3 CPI-SC0D3C12C3F P13942 P05556_P17301 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL11A2 integrin_a2b1_complex P13942_P05556_P17301
4 CPI-SC0B86B7CED Q99715 P05556_P17301 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL12A1 integrin_a2b1_complex Q99715_P05556_P17301
5 CPI-SC0FA343CEF Q5TAT6 P05556_P17301 uniprot curated TRUE RVentoTormo Adhesion-Adhesion Adhesion by Collagen/Integrin COL13A1 integrin_a2b1_complex Q5TAT6_P05556_P17301

4.2. Mapping CellChatDB interactions

We now do the same mapping in CellChat’s interaction database to convert the interacting partners to uniprot ID format.

Mapping CellChat’s interaction database (gene)

CCDB.gene <- read_csv("results/gene_cellchat.csv")

Recall CellChat gene mapping table here

# Name to Uniprot (Gene) ----
CCDB <- CellChatDB.human$interaction %>% 
  left_join(CCDB.gene %>% 
    select(entry_id_uniprot, symbol), by = c("ligand" = "symbol")) %>% 
  mutate(uniprot_ligand = ifelse(!is.na(entry_id_uniprot), entry_id_uniprot, ligand)) %>% 
  select(-entry_id_uniprot) %>% 

  left_join(CCDB.gene %>% 
    select(entry_id_uniprot, symbol), by = c("receptor" = "symbol")) %>% 
  mutate(uniprot_receptor = ifelse(!is.na(entry_id_uniprot), entry_id_uniprot, receptor)) %>% 
  select(-entry_id_uniprot)

Mapping CellChat’s interaction database (complex)

# Name to Uniprot (Complex) ----
CCDB <- CCDB %>% 
  left_join(complex.map %>% 
    select(complex_key, CellChat_name), by = c("ligand" = "CellChat_name")) %>% 
  mutate(uniprot_ligand = ifelse(!is.na(complex_key),complex_key, uniprot_ligand)) %>% 
  select(-complex_key) %>% 

  left_join(complex.map %>% 
    select(complex_key, CellChat_name), by = c("receptor" = "CellChat_name")) %>% 
  mutate(uniprot_receptor = ifelse(!is.na(complex_key), complex_key, uniprot_receptor)) %>% 
  select(-complex_key)

rm(CCDB.gene)

CCDB <- CCDB %>% 
  mutate(unique_int = paste(uniprot_ligand, uniprot_receptor, sep = "_")) %>% 
  relocate(interaction_name, pathway_name, ligand, uniprot_ligand, receptor, uniprot_receptor)

Note that three duplicated interactions (“POMC_OPRD1”, “POMC_OPRK1”, and “POMC_OPRM1”) were removed from CellChat’s database, and we will stick with its equivalents (“b-Endorphin-POMC_OPRD1”, “b-Endorphin-POMC_OPRK1”, “b-Endorphin-POMC_OPRM1”) respectively to match that in CellPhoneDB’s database.
Updated: 09/10/2025

CCDB <- CCDB %>% 
  filter(! interaction_name %in% c("POMC_OPRD1", "POMC_OPRK1", "POMC_OPRM1"))

CCDB %>% dim()
[1] 3230   31

View full CellChat interaction mapping table

CCDB

You may click on the rows if you wish to view information of the L-R location and keywords.

4.3. Overlap analysis of databases

We found that some interactions across the CellChat and CellPhoneDB’s interaction databases were the same, but their interacting partner were simply flipped.

e.g. Partner A was denoted the “ligand” in CellChat, but “receptor” in CellPhoneDB, with the corresponding flipped order for its partner B (“receptor” in CellChat and “ligand” in CellPhoneDB).

This is common for interactions that fall in the “Cell-Cell Contact” category. To identify these interactions, we extracted the L-R interactions that were found only in CellChat and kept interactions where both ligand and receptor are found in CellPhoneDB as well. This is because when at least one interacting partner is not found in CellPhoneDB, that interaction will not map to CellPhoneDB’s interaction database - flipped or not.

We looked at which interactions were mapped to CellPhoneDB after we perform a flip of CellChat’s interaction data, and assigned the ligand-receptor pair directionality in CellPhoneDB with respect to CellChat.

diff.int <- setdiff(CCDB$unique_int, CPDB$unique_int) # 1039 NOT in CellPhoneDBv5
cpdb.lr <- c(CPDB$multidata_1_id, CPDB$multidata_2_id) %>% 
  unique()

diff.df <- CCDB %>% 
  filter(unique_int %in% diff.int,
         uniprot_ligand %in% cpdb.lr, # keep only relevant to CellPhoneDB
         uniprot_receptor %in% cpdb.lr) 

diff.df %>% dim() # 410
[1] 410  31

There are 410 unmapped interactions in CellChat’s interaction database, where both ligand and receptor can be found in CellPhoneDB.

What happens to the number of unmapped interactions when we perform flip?

diff.df <- diff.df %>% 
  mutate(flipped = paste(uniprot_receptor, uniprot_ligand, sep = "_"))

new.diff <- setdiff(diff.df$flipped, CPDB$unique_int)
length(new.diff) # 362
[1] 362

After performing a flip of the interacting partners, we observe that the number of unmapped interaction now becomes 362. This indicates that after flip, 48 interactions mapped to CellPhoneDB’s interaction. Therefore, we say that these 48 interactions in CellPhoneDB have their partners reversed with respect to CellChat.

flip.df <- diff.df %>% 
  filter(!(flipped %in% new.diff)) %>% 
  select(-agonist, -antagonist, -starts_with("co_")) # remove cols with NA

flip.df %>% dim()
[1] 48 28
flip.df

You may click on the rows if you wish to view information of the L-R location and keywords.

Interactions to flip were under “Cell-Cell Contact” category

flip.df %>% 
  tabyl(annotation) %>% 
  knitr::kable(align = rep("l", ncol(.)))
annotation n percent
Cell-Cell Contact 48 1

Visualising overlap between CellPhoneDB and CellChat’s interaction database

to.flip <- flip.df$flipped

rm(diff.df, flip.df, cpdb.lr, diff.int, new.diff)

CPDB <- CPDB %>% 
  mutate(ligand     = ifelse(unique_int %in% to.flip, multidata_2_id, multidata_1_id),
         from       = ifelse(unique_int %in% to.flip, partner_2, partner_1), # gene / complex name
         receptor   = ifelse(unique_int %in% to.flip, multidata_1_id, multidata_2_id),
         to         = ifelse(unique_int %in% to.flip, partner_1, partner_2), # gene / complex name
         unique_int = paste(ligand, receptor, sep = "_")) %>% 
  select(-starts_with("partner"), - starts_with("multi"))

rm(to.flip)

ggvenn(list(CellPhoneDBv5 = CPDB$unique_int, CellChatDBv2 = CCDB$unique_int))

write_csv(CPDB, file = "results/mapped_cellphonedb.csv")
write_csv(CCDB, file = "results/mapped_cellchatdb.csv")

View full CellPhoneDB interaction mapping table

CPDB


View the full CellChat interaction mapping table back here!