Tutorial to create igraph objects from spatial data and calculate fragmentation indices with riverconn

Damiano Baldan, David Cunillera-Moncusi

2023-03-02

Content

This document is meant as a tutorial to assist in the process of creating a river network object from widely available geodatasets. The network-based description of river systems greatly facilitates the computation of indices to assess riverscape connectivity and fragmentation. In this tutorial, the package riverconn is used for the calculation of such indices. You can learn more on the package functionalities and architecture by reading the Baldan et al. (2022) paper. If you use this tutorial, do not forget to cite this paper.

1. Introduction

This document is a step-by-step guide to create an igraph object starting from commonly available geodata. Here, an example is provided using data for the Ebro river (Spain), based on widely available geodata from HydroSHEDS and public repositories from the Hydrographic Confederation of the Ebro river CHE.

The source code used to generate this tutorial is available on github in the file tutorial_network_creation.Rmd; additional functions used in the tutorial are stored in the file functions_network_creation.R.

After this tutorial is completed, the resulting igraph object is meant to be used as input to functions to assess riverscape fragmentation. The riverconn package (available on CRAN and on github) allows for the calculation of many fragmentation/connectivity indices. Most graph theory-based indices to asses fragmentation and connectivity of rivers (see Jumani et al., 2020 for a review) conceptualize the riverscape as a graph where vertices (nodes) represent reaches, i.e. distintictly separated river sections; while edges (links) represent splitting objects, such as confluences (where two reaches join), barriers (items that disrupt the longitudial connectivity of the river), or habitat patches boundaries (borders between sections of the river with relatively uniform habitat conditions). This tutorial focuses on barriers (dams), but in principle other fragmentation items can be dealt with similarly.

The igraph package implements routines for graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more. The packages allows for easy construction of igraph objects based on symbolic description of edges and vertices, edges and vertices lists, or adjacency matrices. The book Statistical Analysis of Network Data with R by Kolaczyk and Csardi (2014) offers a comprehensive tutorial on the possibilities offered by the igraph package, including network data manipulation and visualization, descriptive analysis of network graphs, mathematical and statistical modeling of graphs, and network topology inference.

1.1 Packages needed

This guide relies on several packages for data processing (the links direct to useful resources and tutorials):

  • sf, lwgeom, rgdal, raster, elevatr, RANN packages for obtaining and processing spatial data;
  • igraph package for performing network analyses;
  • tidyverse packages for efficient and easy-to-read data pipeline (see here and here for the most frequently used functions);
  • ggnetwork for producing easy, ggplot-like, plot of class igraph objects and ggspatial for easy, ggplot-like plot, of spatial data;
  • corrmorant to plot nice correlation plots;
  • riverconn to calculate the fragmentation indices for the riverscape.

Make sure those libraries are installed, updated, and loaded.

library(tidyverse)
library(sf)
library(raster)
library(ggspatial)
library(viridis)
library(igraph)
library(riverconn)
library(elevatr)
library(gridExtra)
library(ggnetwork)
library(lwgeom)
library(gridExtra)
library(corrmorant)
library(RANN)
library(ggpubr)
library(cowplot)

Most of the geospatial processing in this tutorial occurs on sf objects. Thanks to their data.frame-like structure, spatial operations (e.g., selecting and updating columns and rows, join with other data frames) are “easier” on such objects. Most spatial objects are converted to this class as soon as possible in the script.

1.2 Workflow steps

The workflow suggested in this document follows several steps:

  • data retrieval from existing datasets;
  • data pre-processing;
  • network creation;
  • indices calculation and comparison.

1.3 Sources of the spatial data

The construction of the igraph object requires at least some information on the spatial geometry of the river system and the position of the barriers.

The river network shapefile can be retrieved from HydroSHEDS, a global hydrographic atlas with geo-referenced datasets derived from spaceborn elevation data. HydroSHEDS is widely used for global to regional analyses (Lehner et al., 2008). River network data (HydroRIVERS) can be downloaded from here. Basins and catchments shapefiles (HydroBASINS) are also available with 12 detail levels (Lehner et al., 2013). The Ebro catchment border can be selected from the polygons available in the HydroBASINS layer (use level 5 basins shapefile for the identification) and the corresponding network can be extracted from the HydroSHEDS layer.

Other options are available for catchments and rivers. The European Catchments and Rivers network system (ECRINS) dataset provides good quality data covering most of Europe. The ECRINS data for rivers are structured with segments and nodes, making the creation of a network quite straight-forward. ECRINS has a higher resolution than HydroRIVERS. The recent Hydrography90m dataset can also be used. Local sources (e.g. hydrography layers from local authorities) are also generally available, but particular care should be taken in preprocessing to correct the presence of disconnected segments and loops in the shapefile. Different HydroSHEDS and HydroRIVERS versions are available. This tutorial uses HydroRIVERS v1.1: note that the previous versions were slighthly different in terms of attributes (for further details see the HydroRIVERS technical documentation).

The shapefile with the geographic position of the dams was obtained from the Ebro river public Geographic Information System CHE website.

The AMBER dataset described in Belletti et al. (2020) contains barriers data compiled from different national and international sources. Metadata on barriers type (weir, dam, culvert, etc) and height are available but most barriers are unclassified.

The corresponding shapefiles are available here and can be imported in R (note that the river shapefile “Ebre_riv_modif.shp” has been sligtly modified from the original hydrosheds shapefile -see next sections on the checks carried out on the shapefiles. The original shapefile is saved in the Ebro_shape_original/ folder for comparison).

shape_river <- st_read("Ebro_shape_corrected/Ebro_rivers.shp")
## Reading layer `Ebro_rivers' from data source 
##   `C:\Users\Damiano\Dropbox\riverconn_tutorial\Ebro_shape_corrected\Ebro_rivers.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 4985 features and 14 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -4.3625 ymin: 40.3625 xmax: 2.116667 ymax: 43.15417
## Geodetic CRS:  WGS 84
shape_basin <- st_read("Ebro_catchment/Ebro_mask.shp")
## Reading layer `Ebro_mask' from data source 
##   `C:\Users\Damiano\Dropbox\riverconn_tutorial\Ebro_catchment\Ebro_mask.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 13 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -4.4 ymin: 40.34167 xmax: 2.154167 ymax: 43.17917
## Geodetic CRS:  WGS 84
shape_dams <- st_read("Ebro_dams/Embalses/Embalses.shp")
## Reading layer `Embalses' from data source 
##   `C:\Users\Damiano\Dropbox\riverconn_tutorial\Ebro_dams\Embalses\Embalses.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 224 features and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 408868.8 ymin: 4505766 xmax: 903207.1 ymax: 4766629
## Projected CRS: ETRS89 / UTM zone 30N

1.4 Functions to support data processing

Some custom functions were developed and were used in the data processing. The source code and the documentation for these functions is found in Appendix B. Before starting, make sure those functions are loaded in the global environment: get_outlet, create_network, snap_to_river, headwaters_dam, multiple_confluences, gaps_detection, check_components. Detailed information on the use of these functions is presented in the next sections.

2. Data pre-processing

Some data preprocessing is needed to prepare the input data for network construction.

2.1 Shapefiles processing

Shapefiles derived from the HydroRIVERS layer have a high detail level: a network with a high detail level will have a large amount of nodes and links, making the algorithms for network analysis slow. Thus it makes sense to reduce the size of the network by pruning the most upstream reaches. The information on the upstream area (suquare kilometers) is stored in the field ‘UPLAND_SKM’.

Note that different HydroRIVERS versions are available. This tutorial uses HydroRIVERS v1.1, wuere different fields are available for pruning the river network based on the size (e.g. ORD_STRA, ORD_CLAS, ORD_FLOW). Previous versions have different attributes. For instance, in HydroRIVERS v1.0 the field UP_CELLS (number of upstream cells) can be used to prune the river network. Check the dataset documentation for further details.

# Set a threshold of 80 square kilometers
threshold = 80

# Prune HydroRIVERS network based on upstream area
shape_river_small <- shape_river[as.numeric(shape_river$UPLAND_SKM) > threshold,]

Next, the dams shapefile can be converted to a shapefile and a unique id can be given to each point. As the dams dataset is a polygon shapefile the centroid of each poligon has to be calculated to obtain a “point” layer. The reference system of the dams layer is changed to WGS84, the same reference system of HydroSHEDS and HydroRIVERS.

dams_to_points <- shape_dams %>%
  st_as_sf %>%
  st_centroid %>%
  mutate(id = 1:nrow(.))%>% 
  dplyr::select(id)%>% 
  st_transform(crs = "+proj=longlat +datum=WGS84")

The shapefiles can be plotted to check there are no errors in the geolocalization of the data.

ggplot() +
  coord_fixed() +
  theme_minimal() +
  ggspatial::layer_spatial(shape_basin, fill = NA, color = "gray90") +
  ggspatial::layer_spatial(shape_river_small, aes(color = log10(UPLAND_SKM)) )+
  ggspatial::layer_spatial(dams_to_points) +
  scale_color_viridis(direction = -1, name= "upstream area \n(log10[Km^2])") +
  theme(legend.position = "bottom") +
  ggspatial::annotation_scale(location = "bl", style = "ticks") +
  ggspatial::annotation_north_arrow(location = "br")+
  labs(caption = "Black dots are the position of the dams")

2.2 Confluences processing

Every river network contains confluences, i.e. points where two reaches merge into one. In the river graph defined above, confluences are edges (links) between nodes. To define confluences the pruned river shapefile is first simplified and its geometry is casted to “POINT”. Note that for large shapefiles the function rgeos::gLineMerge might be more efficient than sf::st_union.

# Simplify river shapefile
shape_river_simple <- shape_river_small %>%
  st_as_sf %>%
  st_union()

# Convert shapefile to point object
river_to_points <- shape_river_simple %>%
  st_as_sf %>%
  st_cast("POINT") %>%
  mutate(id = 1:nrow(.))

Since points are created at the extremes of every segment in the original shapefile, a confluence is delineated where three (or more) of such nodes are overlapping.

# Check points that overlap
joins_selection <- river_to_points %>%
  st_equals() %>%
  Filter(function(x){length(x) > 2}, .) %>%
  lapply(., FUN = min) %>%
  unlist() %>%
  unique()

# Filter original point shapefile to retain only confluences
river_joins <- river_to_points %>% 
  filter(id %in% joins_selection)

The last step is the splitting of the simplified shapefile where it intersects a confluence. This allows to identify unequivocally the river sections that are comprised between two confluences. Such elements will then became vertices (nodes) of the graph. The function lwgeom::st_split is used. This function splits an object of class ‘sf’ with ‘MULTILINESTRING’ geometry into and object of class ‘sf’ containing multiple ‘LINESTRING’ geometries records based on the position of the points. Note that for this function to work properly, the point object must perfectly overlap with the polyline (this can be checked with the function sf::st_distance applied to the points and the polyline, which must return a vector of zeroes).

# Split polyline
shape_river_simplified <- lwgeom::st_split(shape_river_simple, river_joins) %>%
  st_collection_extract(.,"LINESTRING") %>%
  data.frame(id = 1:length(.), geometry = .) %>%
  st_as_sf() %>%
  mutate(length = st_length(.))

Results can be checked with a plot.

ggplot() +
  coord_fixed() +
  ggspatial::layer_spatial(shape_river_simplified, aes(color = id))+
  scale_color_viridis(direction = -1, name= "Reach ID") +
  ggspatial::layer_spatial(river_joins, shape = 1)+
  theme_minimal() +
  theme(legend.position = "bottom")+
  ggspatial::annotation_scale(location = "bl", style = "ticks") +
  ggspatial::annotation_north_arrow(location = "br")+
  labs(caption = "Hollow points are the position of the junctions")

2.3 Dams processing

Dams are processed in a way similar to confluences. The general idea is to obtain a shapefile that is sliced in points where dams or confluences are located. However, dams shapefiles require special care in the processing of the data. First, dams need to be snapped to the river network. Second, dams that are too far away from the river network need to be removed from the analysis because they are probably located in smaller streams that were removed beforehand.

The function snap_to_river (see Appendix B) is used to snap the dams points to the river network based on a tolerance threshold (here, 1000 m were used). After this function, the distance of snapped dams from the river network can be checked to make sure the algorithm worked properly.

# Snap dams
dams_snapped <- snap_to_river(dams_to_points,
                              shape_river_simple %>% st_sf(),
                              max_dist = 1000)
# Retain dams that were snapped
dams_snapped_reduced <-
  dams_snapped[st_contains(shape_river_simple %>% st_sf(), dams_snapped, prepared = FALSE)[[1]],]

# Check if dams were snapped properly to the network (all the distances should be zero)
st_distance(dams_snapped_reduced, shape_river_simple) %>% sum
## 0 [m]

The final step is to check if two dams were snapped in the same geographic point, and to assign passablity values to the obtained snapped points. Here, uniform (not dam-dependent) passabilities were assigned: 0.8 for downstream passability, and 0.1 for upstream passability. An id is assigned to each of the geographic points. Finally, each dam is assigned the value of the upstream area with a spatial join with the original river shapefile (HydroSheds expresses the upstream area as number of cells).

dams_snapped_reduced_joined <- dams_snapped_reduced %>%
  mutate(cluster =
           st_equals(dams_snapped_reduced, dams_snapped_reduced) %>%
           sapply(., FUN = function(x) paste0(x, collapse = "_"))) %>%
  group_by(cluster) %>%
  slice(1) %>%
  ungroup() %>%
  mutate(id_dam = as.character(1:nrow(.)), pass_u = 0.1, pass_d = 0.8) %>%
  as.data.frame %>%
  st_as_sf() %>%
  st_join(., shape_river_small, join = st_is_within_distance, dist = 10 ) %>% 
  group_by(id) %>%
  slice(which.max(UPLAND_SKM)) %>% 
  ungroup()
  

The function headwaters_dam (see Appendix B) can identify dams that are located in the headwaters, meaning that they do not have any river network segment upstream. When no upstream segment is present, the network can not be created (because the dam would be an edge without an upstream vertex). When calling the flag_headwater you should get a column with only “FALSE” statements.

headwaters_checking <- headwaters_dam(dams_snapped_reduced_joined, shape_river_simple)
head(headwaters_checking$flag_headwater)
##       [,1]
## [1,] FALSE
## [2,] FALSE
## [3,] FALSE
## [4,] FALSE
## [5,] FALSE
## [6,] FALSE

Results can be checked with a plot.

ggplot() +
  coord_fixed() +
  ggspatial::layer_spatial(shape_river_simple, color = "gray70")+
  ggspatial::layer_spatial(dams_snapped_reduced, shape = 1) +
  theme_minimal() +
  theme(legend.position = "bottom")+
  ggspatial::annotation_north_arrow(location = "br")+
  ggspatial::annotation_scale(location = "bl", style = "ticks") +
  labs(caption = "Hollow points are the position of the dams")

2.4 Shapefile slicing

The last step is to slice the river shapefile based on both the position of dams and confluences. First a combined ‘junctions’ dataset is created (the network_links object), then the function lwgeom::st_split is applied again. note that the confluences are assigned NA values to the fields present in the dams shapefile. Note also that in the river network shapefile, the field length is calculated. Finally, a spatial join is performed with the original river shapefile to assign to each reach the corresponding upstream area.

The field NodeID is created in the river_net_simplified object. This field will be used later for linking graph properties back to the shapefile for further plotting.

# Create junction point shapefile
network_links <- rbind(
  dams_snapped_reduced_joined %>% 
    mutate(type = "dam", id_barrier = id_dam) %>%
    dplyr::select(type, id_barrier, pass_u, pass_d),
  river_joins %>% mutate(type = "joint") %>%
    dplyr::select(type) %>%
    mutate(id_barrier = NA, pass_u = NA, pass_d = NA) %>%
    rename(geometry = x)) %>%
  mutate(id_links = 1:nrow(.))

# Split river network
river_net_simplified <- lwgeom::st_split(shape_river_simple, network_links) %>%
  st_collection_extract(.,"LINESTRING") %>%
  data.frame(NodeID = 1:length(.), geometry = .) %>%
  st_as_sf() %>%
  mutate(length = st_length(.)) %>%
  st_join(., shape_river_small, join = st_is_within_distance, dist = 0.01 ) %>% 
  group_by(NodeID) %>%
  slice(which.max(UPLAND_SKM)) %>% 
  ungroup()

A plot can reveal if the process was successful.

ggplot() +
  coord_fixed() +
  ggspatial::layer_spatial(river_net_simplified, color = "gray70")+
  ggspatial::layer_spatial(network_links, aes(shape = type))+
  scale_shape_manual(name = "Splitting points", values=c("dam" =17,"joint" = 23))+
  theme_minimal() +
  theme(legend.position = "bottom")+
  ggspatial::annotation_north_arrow(location = "br")+
  ggspatial::annotation_scale(location = "bl", style = "ticks")

Sometimes, the HydroSHEDS-derived shapefiles might have confluences where four or more rivers join. Such type of confluences can cause problems in the next steps. The function multiple_confluences (Appendix B) identifies such problematic confluences so that they can be corrected in a GIS environment. The correction consists in editing the river shapefile to move re-locate one of the reaches (either upstream or downstream) on the intersections that have more than three reaches (this can be easily achieved once these intersections are located by the following function).

Once edited, re-run the tutorial and check again that there are no more four reaches confluences.

confluences <- multiple_confluences(river_net_simplified)
head(confluences)
## Simple feature collection with 6 features and 3 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -4.047917 ymin: 42.78958 xmax: -1.364583 ymax: 42.97292
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 4
##   NodeID             geometry n_confluences flag_confluences
##    <int>          <POINT [°]>         <int> <lgl>           
## 1      1  (-3.46875 42.95208)             3 FALSE           
## 2      2 (-4.047917 42.97292)             3 FALSE           
## 3      5 (-3.552083 42.96875)             3 FALSE           
## 4      7 (-1.364583 42.81042)             3 FALSE           
## 5     10 (-1.835417 42.90625)             3 FALSE           
## 6     11 (-3.352083 42.78958)             3 FALSE

Another common problem in HydroSHEDS-derived shapefiles is the presence of gaps between segments. This issue can cause problems in the network_creation function, which expects a shapefile with no gaps. The functions gaps_detection and check_components (Appendix B) can be used to detect such problematic points in the shapefile. In particular, the function check_components returns an sf file that is a copy of the original river_net_simplified with the additional field component that contains a progressive number identifying multiple components. A correct shapefile should contain only ‘1’ values in this field. A plot can help identifying the location of those components and a manual correction in a gis environment can fix the problem.

Once edited, re-run the tutorial and check again that the shapefile has only one component.

shp_check <- check_components(network_links, river_net_simplified)
head(shp_check)
## Simple feature collection with 6 features and 2 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: -4.277083 ymin: 42.93958 xmax: -3.464583 ymax: 43.11458
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 3
##   NodeID                                                        geometry compo…¹
##    <int>                                                <LINESTRING [°]>   <dbl>
## 1      1 (-3.58125 43.11458, -3.572917 43.10625, -3.572917 43.10208, -3…       1
## 2      2 (-4.277083 43.01042, -4.272917 43.01458, -4.264583 43.01458, -…       1
## 3      3 (-3.95625 43.00625, -3.960417 43.00625, -3.964583 43.00208, -3…       1
## 4      4 (-3.927083 43.01458, -3.94375 43.01458, -3.947917 43.01042, -3…       1
## 5      5 (-3.602083 43.01458, -3.597917 43.01042, -3.589583 43.01042, -…       1
## 6      6 (-3.74375 43.00208, -3.739583 42.99792, -3.739583 42.98958, -3…       1
## # … with abbreviated variable name ¹​component

2.5 Adding additional attributes to the river shapefile

Additional attributes useful for the network calculation can be added to the river shapefile. Elevation is an example. Here we use the function elevatr::get_elev_raster to get the elevation raster. Then, the raster is converted to data frame, and elevation is joined with the shapefile based on the nearest point. This procedure is approximated, but is fine for most of the large-scale analyses.

# get DEM and transform to data frame with coordinates
elevation <- get_elev_raster(shape_basin, z = 8)
catchment_DEM <- raster::as.data.frame(elevation, xy = TRUE)

# Get coordinates of the river network segments
river_net_simplified_centroids <- river_net_simplified %>%
  st_as_sf() %>%
  st_centroid()

# Get coordinates of both elements for joining
Coord_Edges <- st_coordinates(river_net_simplified_centroids) #Coordinates of the joins
Coord_DEM <- catchment_DEM[,1:2] #Coordinates of the dams

# Matching each centroid with its closer altittude point to later obtain the altitudes
matching_altitudes <- RANN::nn2(data=Coord_DEM, query = Coord_Edges, k=1, radius = 1)[[1]] 

# Get values and add to the river shapefile
catchment_DEM <- catchment_DEM[matching_altitudes,3]
river_net_simplified <- river_net_simplified %>% 
  mutate(alt = catchment_DEM)

# To avoid issues in the network creation process, retain only the important columns
river_net_simplified <- river_net_simplified %>% 
  dplyr::select(NodeID, length, alt, DIST_DN_KM, UPLAND_SKM)

Plotting can show weird behaviors of the join operation.

ggplot() +
  coord_fixed() +
  ggspatial::layer_spatial(river_net_simplified, aes(color = alt))+
  scale_color_viridis(name = "Elevation")+
  theme_minimal() +
  theme(legend.position = "bottom")+
  ggspatial::annotation_north_arrow(location = "br")+
  ggspatial::annotation_scale(location = "bl", style = "ticks")

3. Creation of the igraph object

The igraph object can be created using the functions get_outlet and create_network (see Appendix B). The function get_outlet identifies the outlet of the catchment based on the intersection between the river network shapefile and the catchment border. This function is meant to be used with hydroSHEDS-derived data, but the identification can be done manually on any GIS software. The function create_network uses the outlet information, the properly sliced river network shapefile and the junctions shapefile.

It is important to note that: * for using this function the fields with barriers informations must be named ‘id_barrier’, ‘pass_u’ and ‘pass_d’; * in case several fields are available in the river_net_simplified and network_links shapefiles, they should have fairly different names (e.g. fields named elevation and elev should not be present, because a partial match based on field names is done internally in the create_network function). The functions dplyr::select and dplyr::rename can be used to retain only the few fields that are relevant for the next steps.

#outlet <- get_outlet(river_net_simplified, shape_basin, distance = 1)
outlet <- river_net_simplified$NodeID[river_net_simplified$DIST_DN_KM == 0 ]

river_graph <- create_network(network_links, river_net_simplified, outlet)

river_graph is an object of class igraph that keeps the attributes present in the shapefile and in the junction.

# Check igraph object
river_graph
## IGRAPH 3d12149 DN-- 648 647 -- 
## + attr: name (v/n), length (v/n), alt (v/n), DIST_DN_KM (v/n),
## | UPLAND_SKM (v/n), type (e/c), id_links (e/n), id_barrier (e/c),
## | pass_u (e/n), pass_d (e/n)
## + edges from 3d12149 (vertex names):
##  [1] 517->518 522->518 521->522 498->522 276->160 153->160 160->164 162->161
##  [9] 161->164 644->642 648->642 643->645 647->645 640->637 642->637 638->631
## [17] 646->631 634->633 641->633 631->630 633->632 632->630 630->629 629->626
## [25] 636->635 635->626 625->628 623->625 598->625 639->628 624->622 627->622
## [33] 595->598 589->598 618->594 645->618 619->618 621->620 620->594 612->606
## [41] 622->606 611->615 613->615 626->610 637->610 608->602 617->616 616->602
## + ... omitted several edges

# check river_graph edges
igraph::get.edge.attribute(river_graph) %>% names
## [1] "type"       "id_links"   "id_barrier" "pass_u"     "pass_d"

# check river_graph vertices
igraph::get.vertex.attribute(river_graph) %>% names
## [1] "name"       "length"     "alt"        "DIST_DN_KM" "UPLAND_SKM"

3.1 Editing the igraph object

The edges and verices attributes can be further edited. For instance, the ‘length’ vertices attributes is expressed in meters. The value is changed to tens of kilometers for the following calculations.

# update length attribute
V(river_graph)$length <- V(river_graph)$length / 10000
hist(V(river_graph)$length)

Next, the name attribute is converted from numeric to charachter to avoid issues with the index_calculation function.

# update length attribute
V(river_graph)$name <- as.character(V(river_graph)$name)

Another step is to add some fields with habitat suitability information. Two different habitat curves are implemented to link elevation and habitat preference, one for high-altitude organisms and one for low-altitude organisms. Relative reach-specific habitat suitability indices (HSI) are added as attributes of the vertices. Weighted usable length (WUL) is also calculated as the product of length and HSI as the reach-specific fraction of length that is available for organisms.


# Function for organism that prefers high elevation
suit_fun_high <- function(x){dnorm(x, mean = 1500, sd = 500)*410*sqrt(3*pi)}

# Function for organism that prefers low elevation
suit_fun_low <- function(x){exp(- 0.001*x)}

# Calculate HSI for the igraph nodes
V(river_graph)$HSI_low <- suit_fun_low(V(river_graph)$alt)
V(river_graph)$HSI_high <- suit_fun_high(V(river_graph)$alt)

# Calculate weighted usable length for igraph nodes
V(river_graph)$WUL_low <- V(river_graph)$HSI_low * V(river_graph)$length
V(river_graph)$WUL_high <- V(river_graph)$HSI_high * V(river_graph)$length

# Plot the two response functions
1:10:2500 %>%
  data.frame("Elevation" = .,
             "Low" = suit_fun_low(.), 
             "High" = suit_fun_high(.)) %>%
  pivot_longer(c("Low", "High"), 
               names_to = "Type", values_to = "HSI") %>%
  ggplot() + 
  geom_line(aes(x = Elevation, y = HSI, color = Type))+
  theme_bw() + xlab("Elevation (m)") + ylab("Habitat Suitability Index (HSI)")

3.2 Plotting the igraph object

The ggnetwork package can be used to fortify igraph objects for ggplot-friendly plots. For a nice network layout that resembles the geometry of the starting network, the centroids of the reaches are extracted. The habitat suitabilities are plotted.

# Extract reaches centroids
river_net_simplified_centroids <- river_net_simplified %>%
  st_as_sf() %>%
  st_centroid() 

# get the centroids coordinates
coordinates <- river_net_simplified_centroids %>%
  dplyr::mutate(lat = sf::st_coordinates(.)[,1], lon = sf::st_coordinates(.)[,2]) %>%
  dplyr::select(lat, lon) %>%
  st_set_geometry( NULL)

# fortify the igraph object
gg0 <- ggnetwork(river_graph, layout = coordinates %>% as.matrix(), scale = FALSE)

grid.arrange(
ggplot(gg0, aes(x = x, y = y, xend = xend, yend = yend)) +
  coord_fixed() +
  geom_nodes(aes(color = HSI_high)) +
  geom_edges(alpha = 0.5) +
  scale_color_viridis()+
  theme_blank()+
  ggtitle("High-altitude organism") +
  labs(caption = "Network directionality not shown"), 

ggplot(gg0, aes(x = x, y = y, xend = xend, yend = yend)) +
  coord_fixed() +
  geom_nodes(aes(color = WUL_high)) +
  geom_edges(alpha = 0.5) +
  scale_color_viridis()+
  theme_blank()+
  ggtitle("High-altitude organism") +
  labs(caption = "Network directionality not shown"), 

ggplot(gg0, aes(x = x, y = y, xend = xend, yend = yend)) +
  coord_fixed() +
  geom_nodes(aes(color = HSI_low)) +
  geom_edges(alpha = 0.5) +
  scale_color_viridis()+
  theme_blank()+
  ggtitle("Low-altitude organism") +
  labs(caption = "Network directionality not shown"),

ggplot(gg0, aes(x = x, y = y, xend = xend, yend = yend)) +
  coord_fixed() +
  geom_nodes(aes(color = WUL_low)) +
  geom_edges(alpha = 0.5) +
  scale_color_viridis()+
  theme_blank()+
  ggtitle("Low-altitude organism") +
  labs(caption = "Network directionality not shown"),
  ncol=2, nrow=2)

4. Indices calculations

Here some example of indices for river network connectivity applied to river graphs are presented.

4.1 Reach-scale indices

The funtion index_calculation from the riverconn package is used to calculate reach-scale connectivity indices (RCI). Refer to the package vignette for details on the calculation of the indices. For comparison purposes, different indices are calculated: (A) Dendritic connectivity index with symmetric dams passabilities; (B) Dendritic connectivity index with asymmetric dams passabilities; (C) Integral Index of connectivity (dams are not passable and a threshold on dispersal distance is used); (D) Integral Index of connectivity with uniform weights; (E) lowland-preferring organisms with active aquatic dispersal (e.g. fish), (F) upland organisms with active aquatic dispersal (e.g. fish), (G) lowland-preferring organism with passive aquatic dispersal (e.g., invertebrate larval stage), (H) upland organism with passive aquatic dispersal (e.g., invertebrate larval stage, bivalve). Note that for all those indices, the option index_mode = "from" was used, meaning paths exiting from the node are considered for calculating the coincidence probabilities. Thus, such indices can be interpreted as the potential for each node to support dispersal of organisms to the whole river network. An alternative could be to specify index_mode = "to". In this case, only incoming links are considered for the calculation of the index, yielding a prioritization based on the node potential for recolonization. Note that when symmetric dispersal and symmetric barriers passability are used, the two indices should overlap.

Before calculating indices, you should consider carefully the size of the network and its complexity. The index_calculation function might fail for large networks on comuters with limited available RAM because R would not be able to allocate enough memory to efficiently store the \(n * n\) matrices generated in the process. For example, the function was failing with a network size of 10^5 nodes on a computer with 8 GB RAM, but was succesfull on a computer with 32 GB RAM. Few tests might be needed to ballance complexity and computational resources.

# Initialize list where to store all the index calculation outputs
index <- list()
lab_index <- list()
letter_index <- list()

# 1: Symmetric Dendritic Connectivity Index (no biotic effects)
lab_index[[1]] <- "Symmetric DCI"
letter_index[[1]] <- "A"
index[[1]] <- index_calculation(graph = river_graph,
                                weight = "length",
                                B_ij_flag = FALSE,
                                index_type = "reach",
                                index_mode = "from")

# 2: Asymmetric Dendritic Connectivity Index (no biotic effects)
lab_index[[2]] <- "Asymmetric DCI"
letter_index[[2]] <- "B"
index[[2]] <- index_calculation(graph = river_graph,
                                weight = "length",
                                B_ij_flag = FALSE,
                                dir_fragmentation_type = "asymmetric",
                                index_type = "reach",
                                index_mode = "from")

## Before calculating IIC a binary passability has to be defined
E(river_graph)$pass_u_bin <- ifelse(is.na(E(river_graph)$pass_u), NA, 0)
E(river_graph)$pass_d_bin <- ifelse(is.na(E(river_graph)$pass_d), NA, 0)

# 3: Symmetric Integral Index of Connectivity
lab_index[[3]] <- "IIC"
letter_index[[3]] <- "C"
index[[3]] <- index_calculation(graph = river_graph,
                                weight = "length",
                                pass_u = "pass_u_bin",
                                pass_d = "pass_d_bin",
                                param = 3,
                                disp_type = "threshold",
                                index_type = "reach",
                                index_mode = "from")

## Defining uniform weights for IIC
V(river_graph)$unif_w <- 1

# 4: Integral Index of Connectivity with uniform weights
lab_index[[4]] <- "IIC with uniform weights"
letter_index[[4]] <- "D"
index[[4]] <- index_calculation(graph = river_graph,
                                weight = "unif_w",
                                dir_fragmentation_type = "asymmetric",
                                pass_u = "pass_u_bin",
                                pass_d = "pass_d_bin",
                                param = 3,
                                disp_type = "threshold",
                                index_type = "reach",
                                index_mode = "from")

# 5: Population Connectivity Index for lowland fish
lab_index[[5]] <- "PCI lowland fish"
letter_index[[5]] <- "E"
index[[5]] <- index_calculation(graph = river_graph,
                                weight = "WUL_low", 
                                param = 0.8,
                                index_type = "reach",
                                index_mode = "from") 

# 6: Population Connectivity Index for upland fish
lab_index[[6]] <- "PCI upland fish"
letter_index[[6]] <- "F"
index[[6]] <- index_calculation(graph = river_graph,
                                weight = "WUL_high", 
                                param = 0.8,
                                index_type = "reach",
                                index_mode = "from") 


# 7: Population Connectivity Index for lowland passive drifter
## note that units of param_d must be consistent with length field (10s of km)
lab_index[[7]] <- "PCI lowland passive"
letter_index[[7]] <- "G"
index[[7]] <- index_calculation(graph = river_graph,
                                weight = "WUL_low", 
                                dir_distance_type  = "asymmetric",
                                disp_type = "threshold", 
                                param_u = 0, 
                                param_d = 3,
                                index_type = "reach",
                                index_mode = "from")

# 8: Population Connectivity Index for upland passive drifter
## note that units of param_d must be consistent with length field (10s of km)
lab_index[[8]] <- "PCI upland passive"
letter_index[[8]] <- "H"
index[[8]] <- index_calculation(graph = river_graph,
                                weight = "WUL_high", 
                                dir_distance_type  = "asymmetric",
                                disp_type = "threshold",
                                param_u = 0, 
                                param_d = 3,
                                index_type = "reach",
                                index_mode = "from")

A quick check to the variation range.


data.frame("index" = do.call(rbind, lab_index),
           "min" = t(sapply(index, sapply, min))[,4], 
           "mean" = t(sapply(index, sapply, mean))[,4],
           "max" = t(sapply(index, sapply, max))[,4] )
##                      index                  min        mean                max
## 1            Symmetric DCI 0.000136068138828111 0.089094209  0.183348672040763
## 2           Asymmetric DCI 0.000198840008657542 0.243700325  0.575848057363519
## 3                      IIC 4.21802688098917e-05 0.007121115 0.0209559901505482
## 4 IIC with uniform weights  0.00154320987654321 0.008921087 0.0308641975308642
## 5         PCI lowland fish 5.49335271641571e-05 0.016012598 0.0536956831924345
## 6          PCI upland fish 0.000182545268578881 0.009541135 0.0410567699684218
## 7      PCI lowland passive 3.79438573745419e-05 0.003693739  0.020852867904853
## 8       PCI upland passive 1.66202468527347e-05 0.003000132 0.0353297075947445

Resulting maps can be plotted to visually compare the results. For comparisons between different indices, reach ranks are plotted. The lowest rank is the total number of reaches (ca. 650). Reaches with higher rankings offer higher potential for being dispersal sources.

# Initialize empty list
plot_list <- list()

# iterate through list length
for (i in 1:length(index)) {
  
  # Join d_i information with dams shapefile
  river_plot <- river_net_simplified %>%
    mutate(name = as.character(NodeID)) %>%
    left_join(index[[i]]) %>%
    mutate(rank = rank(desc(index)))

  # plot
  plot_iter <- ggplot() +
    coord_fixed() +
    ggspatial::layer_spatial(river_plot, aes(color = rank))+
    scale_color_viridis(name = "Ranking", direction = -1)+
    theme_void()+
    guides(size = FALSE) +
    theme(legend.position = "bottom")+
    ggtitle(paste0(letter_index[[i]], ") ", lab_index[[i]]))
  
    legend <- cowplot::get_legend(plot_iter)
  
  plot_list[[i]] <- plot_iter + 
    theme(legend.position = "none")

}

plot_list[[length(index)+1]] <- legend

plot_final <- ggpubr::ggarrange( plotlist = plot_list)
plot_final


ggsave(plot = plot_final, "Figures/habitat_prioritization.jpeg",  
       width = 18, height = 12, units = "cm")

A correlation plot helps identifying the correlated indices

d_index_chart <- data.frame("d_i_1" = index[[1]]$index, 
                            "d_i_2" = index[[2]]$index,
                            "d_i_3" = index[[3]]$index,
                            "d_i_4" = index[[4]]$index,
                            "d_i_5" = index[[5]]$index,
                            "d_i_6" = index[[6]]$index,
                            "d_i_7" = index[[7]]$index,
                            "d_i_8" = index[[8]]$index)

colnames(d_index_chart) <- c(letter_index[[1]], letter_index[[2]], letter_index[[3]], 
                             letter_index[[4]], letter_index[[5]], letter_index[[6]],
                             letter_index[[7]], letter_index[[8]])

# Spearman corrlations between rankings calculated with different priorities
cor(d_index_chart, method = "spearman")
##            A           B         C          D          E           F         G
## A  1.0000000  0.91224397 0.5542838  0.4941878  0.8789122  0.19375360 0.3691311
## B  0.9122440  1.00000000 0.4955071  0.4322901  0.8570346 -0.08416734 0.3894249
## C  0.5542838  0.49550709 1.0000000  0.8277318  0.7113399  0.41582814 0.5582716
## D  0.4941878  0.43229005 0.8277318  1.0000000  0.7109633  0.30121569 0.4021395
## E  0.8789122  0.85703457 0.7113399  0.7109633  1.0000000  0.13442831 0.5122386
## F  0.1937536 -0.08416734 0.4158281  0.3012157  0.1344283  1.00000000 0.1011033
## G  0.3691311  0.38942489 0.5582716  0.4021395  0.5122386  0.10110333 1.0000000
## H -0.2105778 -0.34538754 0.1183752 -0.1075774 -0.2894423  0.55415117 0.4008371
##            H
## A -0.2105778
## B -0.3453875
## C  0.1183752
## D -0.1075774
## E -0.2894423
## F  0.5541512
## G  0.4008371
## H  1.0000000
corrmorant::corrmorant(d_index_chart, corr_method = "spearman", style = "binned")+
  theme(legend.position = "bottom") +
  scale_color_viridis(name = "Correlation", option = "E", direction = -1, limits = c(-1, 1)) +
  theme(axis.text = element_text(size=5),
        axis.text.x = element_text(angle = 45) )

ggsave("Figures/ranking_habitat_comparison.jpeg",  width = 15, height = 15, units = "cm")

4.2 Barriers prioritization

The function d_index_calculation from the riverconn package is used to identify barriers whose passability improvement can best benefit the catchment connectivity. Refer to the package vignette for details on the calculation of the indices. Check the package vignette for details on the indices used to this end. Here we use the same setups described above (section 4.1).

First dams metadata are to be defined. The object is saved as a dataframe that stores the id of the dams to be individually tested in a column, and the relative improvements in upstream and downstream passabilies (here improvement is assumed to be uniformly 1).

barriers_metadata <- data.frame("id_barrier" =  E(river_graph)$id_barrier[!is.na(E(river_graph)$id_barrier)],
                            "pass_u_updated" = 1,
                            "pass_d_updated" = 1)
head(barriers_metadata)
##   id_barrier pass_u_updated pass_d_updated
## 1         76              1              1
## 2         72              1              1
## 3          7              1              1
## 4          6              1              1
## 5         78              1              1
## 6         51              1              1

Next, the dams prioritization can be executed. The calculation may take a while depending on the computer where it is run. There is the option to parallelize the process (marking -> parallel=TRUE and the number of cores devoted -> ncores= ). It takes 4 minutes with parallel=FALSE for this example in a computer with this procesor: 11th Gen Intel Core i7, 3.30 GHz and 32 GbRAM. On the other hand it takes 1.5 minutes when parallelizing with 7 cores (parallel = TRUE, ncores = 7). To parallelize calculations, the doParallel package is required.

# Initialize list where to store all the index calculation outputs
d_index <- list()
lab_d_index <- list()
letter_d_index <- list()

# 1: Symmetric Dendritic Connectivity Index (no biotic effects)
lab_d_index[[1]] <- "Symmetric DCI"
letter_d_index[[1]] <- "A"
d_index[[1]] <- d_index_calculation(graph = river_graph, 
                                    barriers_metadata = barriers_metadata,
                                    B_ij_flag = FALSE, 
                                    parallel = FALSE)

# 2: Asymmetric Dendritic Connectivity Index (no biotic effects)
lab_d_index[[2]] <- "Asymmetric DCI"
letter_d_index[[2]] <- "B"
d_index[[2]] <- d_index_calculation(graph = river_graph, 
                                    barriers_metadata = barriers_metadata, 
                                    B_ij_flag = FALSE, 
                                    parallel = FALSE, 
                                    dir_fragmentation_type = "asymmetric")

## Before calculating IIC a binary passability has to be defined
E(river_graph)$pass_u_bin <- ifelse(is.na(E(river_graph)$pass_u), NA, 0)
E(river_graph)$pass_d_bin <- ifelse(is.na(E(river_graph)$pass_d), NA, 0)

# 3: Symmetric Integral Index of Connectivity
lab_d_index[[3]] <- "Symmetric IIC"
letter_d_index[[3]] <- "C"
d_index[[3]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "WUL_low",
                                    pass_u = "pass_u_bin",
                                    pass_d = "pass_d_bin",
                                    param = 3,
                                    disp_type = "threshold",
                                    parallel = FALSE)

## Defining uniform weights for IIC
V(river_graph)$unif_w <- 1

# 4: Integral Index of Connectivity with uniform weights
lab_d_index[[4]] <- "IIC with uniform weights"
letter_d_index[[4]] <- "D"
d_index[[4]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "unif_w",
                                    dir_fragmentation_type = "asymmetric",
                                    pass_u = "pass_u_bin",
                                    pass_d = "pass_d_bin",
                                    param = 3,
                                    disp_type = "threshold",
                                    parallel = FALSE)

# 5: Population Connectivity Index for lowland fish
lab_d_index[[5]] <- "PCI lowland fish"
letter_d_index[[5]] <- "E"
d_index[[5]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "WUL_low", 
                                    param = 0.8,
                                    parallel = FALSE) 

# 6: Population Connectivity Index for upland fish
lab_d_index[[6]] <- "PCI upland fish"
letter_d_index[[6]] <- "F"
d_index[[6]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "WUL_high", 
                                    param = 0.8,
                                    parallel = FALSE )


# 7: Population Connectivity Index for lowland passive drifter
## note that units of param_d must be consistent with length field (10s of km)
lab_d_index[[7]] <- "PCI lowland passive"
letter_d_index[[7]] <- "G"
d_index[[7]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "WUL_low", 
                                    dir_distance_type  = "asymmetric",
                                    disp_type = "threshold", 
                                    param_u = 0, 
                                    param_d = 3,
                                    parallel = FALSE)

# 8: Population Connectivity Index for upland passive drifter
## note that units of param_d must be consistent with length field (10s of km)
lab_d_index[[8]] <- "PCI upland passive"
letter_d_index[[8]] <- "H"
d_index[[8]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "WUL_high", 
                                    dir_distance_type  = "asymmetric",
                                    disp_type = "threshold",
                                    param_u = 0, 
                                    param_d = 3,
                                    parallel = FALSE)

# 9: Catchment Area Fragmentation Index
## note that units of param_d must be consistent with length field (10s of km)
lab_d_index[[9]] <- "CAFI"
letter_d_index[[9]] <- "I"
d_index[[9]] <- d_index_calculation(graph = river_graph,
                                    barriers_metadata = barriers_metadata, 
                                    weight = "UPLAND_SKM", 
                                    dir_distance_type  = "symmetric",
                                    B_ij_flag = FALSE,
                                    parallel = FALSE)

A quick check to the variation range of CCI for the baseline.


data.frame("Baseline CCI" = do.call(rbind, lab_d_index),
           "min" = t(sapply(d_index, sapply, min))[,4], # baseline in is column 4
           "mean" = t(sapply(d_index, sapply, mean))[,4],
           "max" = t(sapply(d_index, sapply, max))[,4] )
##               Baseline.CCI                 min        mean                 max
## 1            Symmetric DCI  0.0909335515845068 0.092566524   0.126107269330024
## 2           Asymmetric DCI    0.25182200888365 0.254232665   0.290384217177615
## 3            Symmetric IIC 0.00810297139863181 0.008114263 0.00817098728639399
## 4 IIC with uniform weights 0.00892108672458467 0.008945654 0.00905445054107606
## 5         PCI lowland fish  0.0173825827122186 0.017454205  0.0183293467389933
## 6          PCI upland fish  0.0149304263813881 0.014969916  0.0155246121675143
## 7      PCI lowland passive 0.00494555720870629 0.004950211 0.00497254428181167
## 8       PCI upland passive 0.00811555022541386 0.008120642 0.00819221692543936
## 9                     CAFI   0.203495022267748 0.206691818   0.299794408043562

Resulting maps can be plotted to visually compare the results. For comparisons between different indices, barriers ranks are plotted. The lowest rank is the total number of barriers (ca. 80). Barriers with higher ranks should be prioritized for passability improvement.

# Initialize empty list
plot_list <- list()

# iterate through list length
for (i in 1:length(d_index)) {
  
  # Join d_i information with dams shapefile
  network_links_plot <- network_links %>%
    filter(type == "dam") %>% 
    left_join(d_index[[i]]) %>%
    mutate(d_rank = rank(desc(d_index)))

  # plot
  plot_iter <- ggplot() +
    coord_fixed() +
    ggspatial::layer_spatial(river_net_simplified, color = "gray70")+
    ggspatial::layer_spatial(network_links_plot, 
                             aes(color = d_rank, size = 1/d_rank), alpha = 0.8)+
    scale_color_viridis(name = "Ranking", direction = -1)+
    theme_void()+
    guides(size = FALSE) +
    theme(legend.position = "bottom")+
    ggtitle(paste0(letter_d_index[[i]], ") ", lab_d_index[[i]]))
    
  # legend <- cowplot::get_legend(plot_iter)
  
  plot_list[[i]] <- plot_iter #+ theme(legend.position = "none")

}

# plot_list[[length(d_index)+1]] <- legend

plot_final <- ggpubr::ggarrange( plotlist = plot_list, 
                                 common.legend = TRUE, legend = "bottom")
plot_final


ggsave(plot = plot_final, "Figures/barriers_prioritization.jpeg",  
       width = 18, height = 12, units = "cm")

A corrleation plot hepls identifying the correlated indices

d_index_chart <- data.frame("d_i_1" = d_index[[1]]$d_index, 
                            "d_i_2" = d_index[[2]]$d_index,
                            "d_i_3" = d_index[[3]]$d_index,
                            "d_i_4" = d_index[[4]]$d_index,
                            "d_i_5" = d_index[[5]]$d_index,
                            "d_i_6" = d_index[[6]]$d_index,
                            "d_i_7" = d_index[[7]]$d_index,
                            "d_i_8" = d_index[[8]]$d_index,
                            "d_i_9" = d_index[[9]]$d_index)

colnames(d_index_chart) <- c(letter_d_index[[1]], letter_d_index[[2]], letter_d_index[[3]], 
                             letter_d_index[[4]], letter_d_index[[5]], letter_d_index[[6]],
                             letter_d_index[[7]], letter_d_index[[8]], letter_d_index[[9]])

# Spearman corrlations between rankings calculated with different priorities
cor(d_index_chart, method = "spearman")
##            A          B           C          D          E         F          G
## A  1.0000000  0.8636478 0.637634905 0.26204708  0.9408875 0.3693169 0.64327608
## B  0.8636478  1.0000000 0.648567309 0.29080852  0.8753639 0.2381299 0.67678719
## C  0.6376349  0.6485673 1.000000000 0.54545734  0.7551617 0.2080936 0.97127579
## D  0.2620471  0.2908085 0.545457344 1.00000000  0.3773260 0.2458078 0.47141151
## E  0.9408875  0.8753639 0.755161746 0.37732603  1.0000000 0.2660974 0.73870015
## F  0.3693169  0.2381299 0.208093618 0.24580782  0.2660974 1.0000000 0.23057434
## G  0.6432761  0.6767872 0.971275792 0.47141151  0.7387001 0.2305743 1.00000000
## H -0.2640295 -0.3216590 0.009140793 0.03524209 -0.3429779 0.5990565 0.05841429
## I  0.8891517  0.9147256 0.640364507 0.30417163  0.8815929 0.1295633 0.65949971
##              H          I
## A -0.264029452  0.8891517
## B -0.321659038  0.9147256
## C  0.009140793  0.6403645
## D  0.035242088  0.3041716
## E -0.342977925  0.8815929
## F  0.599056538  0.1295633
## G  0.058414289  0.6594997
## H  1.000000000 -0.3937625
## I -0.393762511  1.0000000
corrmorant::corrmorant(d_index_chart, corr_method = "spearman", style = "binned")+
  theme(legend.position = "bottom") +
  scale_color_viridis(name = "Correlation", option = "E", direction = -1, limits = c(-1, 1)) +
  theme(axis.text = element_text(size=5),
        axis.text.x = element_text(angle = 45))

ggsave("Figures/ranking comparison.jpeg",  width = 15, height = 15, units = "cm")

Appendix A: R session information

devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.2.1 (2022-06-23 ucrt)
##  os       Windows 10 x64 (build 19045)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  English_United States.utf8
##  ctype    English_United States.utf8
##  tz       Europe/Berlin
##  date     2023-03-02
##  pandoc   2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  ! package       * version    date (UTC) lib source
##    abind           1.4-5      2016-07-21 [1] CRAN (R 4.2.0)
##    assertthat      0.2.1      2019-03-21 [1] CRAN (R 4.2.1)
##    backports       1.4.1      2021-12-13 [1] CRAN (R 4.2.0)
##    bookdown        0.32       2023-01-17 [1] CRAN (R 4.2.2)
##    broom           1.0.1      2022-08-29 [1] CRAN (R 4.2.1)
##    bslib           0.4.1      2022-11-02 [1] CRAN (R 4.2.2)
##    cachem          1.0.6      2021-08-19 [1] CRAN (R 4.2.1)
##    callr           3.7.3      2022-11-02 [1] CRAN (R 4.2.2)
##    car             3.1-1      2022-10-19 [1] CRAN (R 4.2.2)
##    carData         3.0-5      2022-01-06 [1] CRAN (R 4.2.1)
##    cellranger      1.1.0      2016-07-27 [1] CRAN (R 4.2.1)
##    class           7.3-20     2022-01-16 [2] CRAN (R 4.2.1)
##    classInt        0.4-8      2022-09-29 [1] CRAN (R 4.2.2)
##    cli             3.4.0      2022-09-08 [1] CRAN (R 4.2.1)
##    codetools       0.2-18     2020-11-04 [2] CRAN (R 4.2.1)
##    colorspace      2.0-3      2022-02-21 [1] CRAN (R 4.2.1)
##    corrmorant    * 0.0.0.9007 2022-10-13 [1] Github (r-link/corrmorant@e968860)
##    cowplot       * 1.1.1      2020-12-30 [1] CRAN (R 4.2.1)
##    crayon          1.5.2      2022-09-29 [1] CRAN (R 4.2.2)
##    curl            4.3.3      2022-10-06 [1] CRAN (R 4.2.2)
##    DBI             1.1.3      2022-06-18 [1] CRAN (R 4.2.1)
##    dbplyr          2.2.1      2022-06-27 [1] CRAN (R 4.2.1)
##    devtools        2.4.5      2022-10-11 [1] CRAN (R 4.2.2)
##    digest          0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
##    dodgr           0.2.18     2022-12-07 [1] CRAN (R 4.2.2)
##    doParallel      1.0.17     2022-02-07 [1] CRAN (R 4.2.1)
##    dplyr         * 1.0.10     2022-09-01 [1] CRAN (R 4.2.1)
##    e1071           1.7-12     2022-10-24 [1] CRAN (R 4.2.2)
##    elevatr       * 0.4.2      2022-01-07 [1] CRAN (R 4.2.2)
##    ellipsis        0.3.2      2021-04-29 [1] CRAN (R 4.2.1)
##    evaluate        0.19       2022-12-13 [1] CRAN (R 4.2.1)
##    fansi           1.0.3      2022-03-24 [1] CRAN (R 4.2.1)
##    farver          2.1.1      2022-07-06 [1] CRAN (R 4.2.1)
##    fastmap         1.1.0      2021-01-25 [1] CRAN (R 4.2.1)
##    forcats       * 0.5.2      2022-08-19 [1] CRAN (R 4.2.1)
##    foreach         1.5.2      2022-02-02 [1] CRAN (R 4.2.1)
##    fs              1.5.2      2021-12-08 [1] CRAN (R 4.2.1)
##    gargle          1.2.1      2022-09-08 [1] CRAN (R 4.2.1)
##    generics        0.1.3      2022-07-05 [1] CRAN (R 4.2.1)
##    ggnetwork     * 0.5.10     2021-07-06 [1] CRAN (R 4.2.2)
##    ggplot2       * 3.4.0      2022-11-04 [1] CRAN (R 4.2.2)
##    ggpubr        * 0.5.0      2022-11-16 [1] CRAN (R 4.2.2)
##    ggsignif        0.6.4      2022-10-13 [1] CRAN (R 4.2.2)
##    ggspatial     * 1.1.7      2022-11-24 [1] CRAN (R 4.2.2)
##    glue            1.6.2      2022-02-24 [1] CRAN (R 4.2.1)
##    googledrive     2.0.0      2021-07-08 [1] CRAN (R 4.2.1)
##    googlesheets4   1.0.1      2022-08-13 [1] CRAN (R 4.2.1)
##    gridExtra     * 2.3        2017-09-09 [1] CRAN (R 4.2.1)
##    gtable          0.3.1      2022-09-01 [1] CRAN (R 4.2.1)
##    haven           2.5.1      2022-08-22 [1] CRAN (R 4.2.1)
##    highr           0.9        2021-04-16 [1] CRAN (R 4.2.1)
##    hms             1.1.2      2022-08-19 [1] CRAN (R 4.2.1)
##    htmltools       0.5.4      2022-12-07 [1] CRAN (R 4.2.2)
##    htmlwidgets     1.5.4      2021-09-08 [1] CRAN (R 4.2.1)
##    httpuv          1.6.6      2022-09-08 [1] CRAN (R 4.2.1)
##    httr            1.4.4      2022-08-17 [1] CRAN (R 4.2.1)
##    igraph        * 1.3.5      2022-09-22 [1] CRAN (R 4.2.2)
##    iterators       1.0.14     2022-02-05 [1] CRAN (R 4.2.1)
##    jquerylib       0.1.4      2021-04-26 [1] CRAN (R 4.2.1)
##    jsonlite        1.8.4      2022-12-06 [1] CRAN (R 4.2.2)
##    KernSmooth      2.23-20    2021-05-03 [2] CRAN (R 4.2.1)
##    knitr           1.41       2022-11-18 [1] CRAN (R 4.2.2)
##    labeling        0.4.2      2020-10-20 [1] CRAN (R 4.2.0)
##    later           1.3.0      2021-08-18 [1] CRAN (R 4.2.1)
##    lattice         0.20-45    2021-09-22 [2] CRAN (R 4.2.1)
##    lifecycle       1.0.3      2022-10-07 [1] CRAN (R 4.2.2)
##    lubridate       1.9.0      2022-11-06 [1] CRAN (R 4.2.2)
##    lwgeom        * 0.2-10     2022-11-19 [1] CRAN (R 4.2.2)
##    magrittr        2.0.3      2022-03-30 [1] CRAN (R 4.2.1)
##    memoise         2.0.1      2021-11-26 [1] CRAN (R 4.2.1)
##    mime            0.12       2021-09-28 [1] CRAN (R 4.2.0)
##    miniUI          0.1.1.1    2018-05-18 [1] CRAN (R 4.2.2)
##    modelr          0.1.10     2022-11-11 [1] CRAN (R 4.2.2)
##    munsell         0.5.0      2018-06-12 [1] CRAN (R 4.2.1)
##    osmdata         0.1.10     2022-06-09 [1] CRAN (R 4.2.1)
##    pillar          1.8.1      2022-08-19 [1] CRAN (R 4.2.1)
##    pkgbuild        1.4.0      2022-11-27 [1] CRAN (R 4.2.2)
##    pkgconfig       2.0.3      2019-09-22 [1] CRAN (R 4.2.1)
##    pkgload         1.3.2      2022-11-16 [1] CRAN (R 4.2.2)
##    plyr            1.8.8      2022-11-11 [1] CRAN (R 4.2.2)
##    prettyunits     1.1.1      2020-01-24 [1] CRAN (R 4.2.1)
##    processx        3.8.0      2022-10-26 [1] CRAN (R 4.2.2)
##    profvis         0.3.7      2020-11-02 [1] CRAN (R 4.2.2)
##    progress        1.2.2      2019-05-16 [1] CRAN (R 4.2.1)
##    progressr       0.12.0     2022-12-13 [1] CRAN (R 4.2.1)
##    promises        1.2.0.1    2021-02-11 [1] CRAN (R 4.2.1)
##    proxy           0.4-27     2022-06-09 [1] CRAN (R 4.2.1)
##    ps              1.7.2      2022-10-26 [1] CRAN (R 4.2.2)
##    purrr         * 0.3.5      2022-10-06 [1] CRAN (R 4.2.2)
##    R6              2.5.1      2021-08-19 [1] CRAN (R 4.2.1)
##    ragg            1.2.4      2022-10-24 [1] CRAN (R 4.2.2)
##    RANN          * 2.6.1      2019-01-08 [1] CRAN (R 4.2.2)
##    raster        * 3.6-11     2022-11-28 [1] CRAN (R 4.2.2)
##    Rcpp            1.0.9      2022-07-08 [1] CRAN (R 4.2.1)
##  D RcppParallel    5.1.5      2022-01-05 [1] CRAN (R 4.2.1)
##    readr         * 2.1.3      2022-10-01 [1] CRAN (R 4.2.2)
##    readxl          1.4.1      2022-08-17 [1] CRAN (R 4.2.1)
##    remotes         2.4.2      2021-11-30 [1] CRAN (R 4.2.1)
##    reprex          2.0.2      2022-08-17 [1] CRAN (R 4.2.1)
##    reshape2        1.4.4      2020-04-09 [1] CRAN (R 4.2.1)
##    rgdal           1.6-2      2022-11-09 [1] CRAN (R 4.2.2)
##    riverconn     * 0.3.23     2022-12-23 [1] local
##    rlang           1.0.6      2022-09-24 [1] CRAN (R 4.2.2)
##    rmarkdown       2.18       2022-11-09 [1] CRAN (R 4.2.2)
##    rmdformats      1.0.4      2022-05-17 [1] CRAN (R 4.2.2)
##    rstatix         0.7.1      2022-11-09 [1] CRAN (R 4.2.2)
##    rstudioapi      0.14       2022-08-22 [1] CRAN (R 4.2.1)
##    rvest           1.0.3      2022-08-19 [1] CRAN (R 4.2.1)
##    s2              1.1.1      2022-11-17 [1] CRAN (R 4.2.2)
##    sass            0.4.4      2022-11-24 [1] CRAN (R 4.2.2)
##    scales          1.2.1      2022-08-20 [1] CRAN (R 4.2.2)
##    sessioninfo     1.2.2      2021-12-06 [1] CRAN (R 4.2.2)
##    sf            * 1.0-9      2022-11-08 [1] CRAN (R 4.2.2)
##    shiny           1.7.3      2022-10-25 [1] CRAN (R 4.2.2)
##    slippymath      0.3.1      2019-06-28 [1] CRAN (R 4.2.2)
##    sp            * 1.5-1      2022-11-07 [1] CRAN (R 4.2.2)
##    stringi         1.7.8      2022-07-11 [1] CRAN (R 4.2.1)
##    stringr       * 1.5.0      2022-12-02 [1] CRAN (R 4.2.2)
##    systemfonts     1.0.4      2022-02-11 [1] CRAN (R 4.2.2)
##    terra           1.6-47     2022-12-02 [1] CRAN (R 4.2.2)
##    textshaping     0.3.6      2021-10-13 [1] CRAN (R 4.2.2)
##    tibble        * 3.1.8      2022-07-22 [1] CRAN (R 4.2.1)
##    tidyr         * 1.2.1      2022-09-08 [1] CRAN (R 4.2.1)
##    tidyselect      1.2.0      2022-10-10 [1] CRAN (R 4.2.2)
##    tidyverse     * 1.3.2      2022-07-18 [1] CRAN (R 4.2.2)
##    timechange      0.1.1      2022-11-04 [1] CRAN (R 4.2.2)
##    tzdb            0.3.0      2022-03-28 [1] CRAN (R 4.2.1)
##    units           0.8-1      2022-12-10 [1] CRAN (R 4.2.2)
##    urlchecker      1.0.1      2021-11-30 [1] CRAN (R 4.2.2)
##    usethis         2.1.6      2022-05-25 [1] CRAN (R 4.2.2)
##    utf8            1.2.2      2021-07-24 [1] CRAN (R 4.2.1)
##    vctrs           0.5.1      2022-11-16 [1] CRAN (R 4.2.2)
##    viridis       * 0.6.2      2021-10-13 [1] CRAN (R 4.2.1)
##    viridisLite   * 0.4.1      2022-08-22 [1] CRAN (R 4.2.1)
##    withr           2.5.0      2022-03-03 [1] CRAN (R 4.2.1)
##    wk              0.7.1      2022-12-09 [1] CRAN (R 4.2.2)
##    xfun            0.35       2022-11-16 [1] CRAN (R 4.2.2)
##    xml2            1.3.3      2021-11-30 [1] CRAN (R 4.2.1)
##    xtable          1.8-4      2019-04-21 [1] CRAN (R 4.2.1)
##    yaml            2.3.6      2022-10-18 [1] CRAN (R 4.2.2)
## 
##  [1] C:/Users/Damiano/AppData/Local/R/win-library/4.2
##  [2] C:/Program Files/R/R-4.2.1/library
## 
##  D ── DLL MD5 mismatch, broken installation.
## 
## ──────────────────────────────────────────────────────────────────────────────

Appendix B: Source code of the custom functions used to support the processing the data

#' Function to identify river network outlet
#'
#' @param river_network a polyline shapefile containing the river network
#' (must be of class sf)
#' @param catchment_mask a polygon shapefile with the extent of the catchment
#' (must be of class sf)
#'
#' @return the index of the river network that is corresponding to the outlet
#' (index points to the row in the river_network_shapefile)
#' @export
#'
#' @examples
#'
get_outlet <- function(river_network, catchment_mask, distance) {
  catchment_mask %>%
    st_union() %>%
    st_cast("LINESTRING") %>%
    st_is_within_distance(river_network, .,
                          dist = distance, sparse = FALSE) %>%
    match(TRUE, .)
}

#' Function to create an object of class igraph with the river network attributes and
#' the right directionality
#'
#' @param network_links a point shapefile containing the river joins and dams
#' (must be of class sf)
#' @param river_net_simplified a polyline shapefile containing the river
#' network already sliced (must be of class sf). Must contain an ID column named 'NodeID'.
#' @param outlet integer indicating the reach in river_net_simplified that is
#' the outlet of the catchment
#'
#' @return 'igraph' object of the river network
#' @export
#'
#' @examples
#'
create_network <- function(network_links, river_net_simplified, outlet) {
  
  # - use spatial join to get information about the links
  # - create a distance matrics in long format
  # - create temporaty network: it includes triangular cliques corresponding
  # to the joints
  # - loop over the triangles to remove the edges that are not good for having a
  # nicely directed network
  # - create a new undirected graph without loops
  # - define the outlet
  # - create a directed graph by setting directions based on the distance to the outlet
  # Check where river_net_simplified is overlapping with network links,
  # retain the information and save to network_links
  
  network_links_df <- st_is_within_distance(network_links, river_net_simplified,
                                            dist = 0.01) %>%
    lapply(.,
           FUN = function(x){data.frame("from" = x[1], "to" = x[2], "to2" = x[3]) }) %>%
    do.call(rbind,.) %>%
    cbind(network_links %>% st_drop_geometry())
  
  # Get a full distance matrix
  # - note: directions are messed up!
  # - note: for the joints I am creating triangles (i.e. there are 3 nodes and 3 links):
  # this is to be corrected afterwards when I decide the directionality
  full_net_links_df <- rbind(
    network_links_df %>%
      filter(!is.na(to2)) %>%
      dplyr::select(-to2) %>%
      rename(from = from, to = to),
    network_links_df %>%
      filter(!is.na(to2)) %>%
      dplyr::select(-to) %>%
      rename(from = from, to = to2),
    network_links_df %>%
      filter(!is.na(to2)) %>%
      dplyr::select(-from) %>%
      rename(from = to, to = to2) ,
    network_links_df %>%
      filter(is.na(to2)) %>%
      dplyr::select(-to2)
  ) %>%
    #dplyr::select(from, to, type, id_dam, pass_u, pass_d) %>%
    mutate(id_links = 1:nrow(.))
  
  # Create vertices vector
  vertices <- river_net_simplified %>% 
    st_drop_geometry %>%
    mutate(name = NodeID) %>%
    mutate(length = as.numeric(length))
  
  # Create a graph based on this distance matrix and adjust directions
  # note: add vertex name so it's easier to identify them in the next steps
  river_temp_2 <- igraph::graph_from_data_frame(
    d = full_net_links_df,
    v = vertices,
    directed = FALSE) %>%
    igraph::simplify(remove.loops = FALSE, remove.multiple = TRUE, edge.attr.comb="first")
  
  # Remove triangles from graph: this way the joints are keeping the right directionality
  cliq3 <- cliques(river_temp_2, 3, 3)
  links_to_remove <- list()
  for (i in 1:length(cliq3)){
    full_dist <- distances(river_temp_2, v =cliq3[[i]], to = outlet)
    links_to_remove[[i]] <- data.frame(
      "from" = rownames(full_dist)[full_dist != min(full_dist)],
      "to" = rownames(full_dist)[full_dist != min(full_dist)] %>% rev)
  }
  links_to_remove <- do.call(rbind, links_to_remove)
  
  # Create a graph with the edges to remove and subtract it from the original graph
  graph_to_remove <- graph_from_data_frame(d = links_to_remove,
                                           directed = FALSE)
  river_temp <- difference(river_temp_2, graph_to_remove )
  
  # Make the graph directional
  # the algorithms checks edges distances to outlet and assigns direction
  river_temp_df <- igraph::as_data_frame(river_temp, what = "edges") %>%
    mutate(from = as.numeric(from), to = as.numeric(to))
  out <- list()
  for (iii in 1:nrow(river_temp_df)) {
    df_iter <- river_temp_df[iii, ]
    delta <- distances(river_temp, v = df_iter$from, to = outlet) -
      distances(river_temp, v = df_iter$to, to = outlet)
    if (delta > 0) {from = df_iter$from; to = df_iter$to}
    if (delta < 0) {from = df_iter$to; to = df_iter$from}
    out[[iii]] <- data.frame("from" = from, "to" = to,
                             "type" = df_iter$type, 
                             "id_links" = df_iter$id_links, 
                             "id_barrier"=df_iter$id_barrier,
                             "pass_u" = df_iter$pass_u,
                             "pass_d" = df_iter$pass_d)
  }
  
  river_graph_df <- do.call(rbind, out)
  river_graph <- graph_from_data_frame(
    d = river_graph_df,
    directed = TRUE,
    v = vertices)
  
  # Return result
  return(river_graph)
}


#' Function to snap points to a dense polyline
#'
#' @param x a point shapefile that contains the points to be snapped
#' (must be of class sf)
#' @param y a polyline shapefile that contains the river network
#' (must be of class sf)
#' @param max_dist maximum distance for snapping points
#'
#' @return a data frame that mirrors x, but with the geometries snapped to y
#' (the result of st_distance(x, y) is a vector of 0s)
#' @export
#' 
#' @details this function needs a dense polyline shapefile
# (i.e. a shapefile that has small -short- lines).
# This shapefile can be built in ArcGIS using the functions to generate
# equally spaced points along a polyline and then to
# split line at points (points should be sampled with 100 - 500 m intervals )
#'
#' @examples
#' 
snap_to_river <- function(x, y, max_dist = 1000){
  
  ## NOTE: This function is needed because sf is not good at snapping
  
  # Simplify y to check distance
  y_simple <- y %>% st_union()
  # Make the river shapefile dense: extract points at each vertex
  y_points <- y %>%
    st_as_sf %>%
    st_cast("POINT") %>%
    mutate(id = 1:nrow(.))
  # Make the river shapefile dense: add splits at each vertex
  y_complex <- lwgeom::st_split(y, y_points) %>%
    st_collection_extract(.,"LINESTRING") %>%
    st_as_sf() %>%
    mutate(length = st_length(.))
  ## Check the distribution length of the simplified reaches
  #hist(y_complex$length, nclass = 300)
  # Initialize output list
  out <- list()
  # Loop over x rows
  for (i in 1:nrow(x)){
    # Select the point that will be snapped
    x_i.snap <- x[i,]
    # Extract the nearest feature
    nf <- y_complex[st_nearest_feature(x[i,], y_complex),]
    # Check the distance between the point to snap and the nearest feature
    # if yes, then snap, otherwise do not snap and keep the original point
    if ( as.numeric(st_distance(x[i,], nf)) < max_dist) {
      # # Snap the point to the nearest feature NOTE: st_snap is not working properly
      # x_i.snap <- st_snap(x[i,], nf %>% st_cast("POINT"), tolerance = max_dist)
      # Transform the nearest polyline to points
      nf.points <- st_cast(nf, "POINT")
      # Select the point that has the minimum distance to the point to snap
      nf.points.min <- nf.points[ which.min(st_distance(x[i,],nf.points)),]
      # Substitute the geometry of the point to snap with
      # the geometry of the closest point in the nearest feature (nf) object
      x_i.snap$geometry <- nf.points.min$geometry
      # # Check the distance
      # st_distance(x_i.snap, y_simple)
    }
    # Create output data frame
    out[[i]] <- x_i.snap
    #cat(paste0(i, " "))
  }
  # out_x should contain the same metadata of dams
  out_x <- do.call(rbind, out)
  return(out_x)
}


#' Check if dams are located on the headwaters of the network
#'
#' @param dams a sf point shapefile that is snapped to shape
#' @param shape a sf polyline shapefile
#'
#' @return a data.frame that contains all the attributes in the dams shapefile plus two new columns. 
#' The new columns are: 'n_intersections', reporting how many segments of the polyline are touching the dam, 
#' and dam_headwater that flags dams that are located in the headwaters (if value is TRUE). In this column, 
#' a TRUE value identifies dams that should be removed from the analysis.
#' @export
#' 
#' @description this functions identifies dams that are located in the headwaters. The workflow is as follows:
#' 1) a circular buffer is created around each dam; 2) sf::st_intersect is used to check how many sections of the 
#' river polyline intersect the buffer; 3) if the dam does not have any section of the polyline upstream (i.e. the 
#' buffer intersects the polyline only once), then it is located in the headwaters. These dams are flagged in the 
#' output so the user can remove them
#'
#' @examples
headwaters_dam <- function(dams, shape){
  
  # Create circular buffer around dams
  dams_buffer <- st_buffer(dams, 1)
  
  # Break shapefile at points
  shape_points <- shape %>%
    st_as_sf %>%
    st_cast("POINT") %>%
    mutate(id = 1:nrow(.))
  
  shape_complex <- lwgeom::st_split(shape, shape_points) %>%
    st_collection_extract(.,"LINESTRING") %>%
    st_as_sf()
  
  # Intersect with polyline
  intersections_mat <- st_intersects(dams_buffer, shape_complex)
  
  # Check if some element in intersection_mat is smaller than 2
  # This means that the buffer is crossed only one time by the river network,
  # meaning that the dam does not have an upstream segment
  
  dams_df <- dams %>% 
    st_drop_geometry() %>% 
    mutate(
      n_intersections = do.call(rbind, lapply(intersections_mat, FUN = length)),
      flag_headwater = ifelse(n_intersections == 1, TRUE, FALSE))
  
  return(dams_df)
  
}


#' Check if there are issues with confluences in hydrosheds shapefile 
#'
#' @param shape a sf polyline
#'
#' @return a sf point object with the confluences and two additional fields: n_confluences, containing the number
#' of reaches that enter and exit the confluence, and flag_confluences 
#' that flags the 'suspect' confluences with a TRUE value
#' @export
#'
#' @examples
multiple_confluences <- function(shape){
  
  # Break shapefile at points
  river_to_points <- shape %>%
    mutate(NodeID = 1:nrow(.)) %>%
    st_as_sf %>%
    st_cast("POINT") %>%
    mutate(id = 1:nrow(.))
  
  joins_selection <- river_to_points %>%
    st_equals() %>%
    Filter(function(x){length(x) > 2}, .) %>%
    lapply(., FUN = min) %>%
    unlist() %>%
    unique()
  
  river_joins <- river_to_points %>%
    dplyr::filter(id %in% joins_selection) %>%
    dplyr::select(NodeID) %>% 
    st_as_sf
  
  # Create buffer around shape_points
  river_joins_buffer <- river_joins %>%
    st_buffer(1)
  
  # Intersect with polyline
  intersections_mat <- st_intersects(river_joins_buffer, shape)
  
  # Add information to the joints shapefile
  river_joins$n_confluences <-  as.vector(do.call(rbind, lapply(intersections_mat, FUN = length)) )
  river_joins$flag_confluences <- ifelse(river_joins$n_confluences <=3, FALSE, TRUE)
  
  return(river_joins)
  
}


#' Detects gaps in the river network polyline that might cause the function network_creation to fail
#'
#' @param shape the river network shapefile to be used as input to the network_creation function
#'
#' @return a point shapefile with the position of the gaps in the river network
#'
gaps_detection <- function(shape){
  
  # Break shapefile at points
  river_to_points <- shape %>%
    st_as_sf %>%
    st_cast("POINT") %>%
    mutate(id = 1:nrow(.))
  
  # Retain all confluences, both good and 'broken' ones
  joins_selection <- river_to_points %>%
    st_is_within_distance(dist = 0.0001) %>%
    Filter(function(x){length(x) > 2 }, .) %>%
    lapply(., FUN = min) %>%
    unlist() %>%
    unique()
  
  all_river_joins <- river_to_points %>%
    filter(id %in% joins_selection)
  
  # Select casted points that are next to each confluence
  close_points <- st_is_within_distance(all_river_joins, river_to_points, dist = 1)
  
  # Loop over each points triplet and calculate the distance
  out_dist <- list()
  for (i in 1:length(close_points)){
    out_dist[[i]] <- river_to_points %>%
      filter(id %in% close_points[[i]]) %>%
      st_distance() %>%
      max()
  }
  
  # Problematic points are extracted
  problematic_points <- close_points[which(out_dist > 0)] %>%
    unlist() %>%
    unique()
  
  points_out <- river_to_points %>%
    filter(id %in% problematic_points)
  
  return(points_out)
  
}


#' Creates a river network shapefile with the component attribute that 
#' can be useful for identifying disconnected chunks
#'
#' @param network_links a point shapefile containing the river joins and dams
#' (must be of class sf)
#' @param river_net_simplified a polyline shapefile containing the river
#' network already sliced (must be of class sf).  Must contain an ID column named 'NodeID'.
#' @return sf object that copies river_net_simplified with an additional 
#' 'component' field that can be plotted to spot isolated components
#' @export
#'
#' @examples
#'
check_components <- function(network_links, river_net_simplified) {
  
  # - use spatial join to get information about the links
  # - create a distance matrics in long format
  # - create temporaty network: it includes triangular cliques corresponding
  # to the joints
  # - loop over the triangles to remove the edges that are not good for having a
  # nicely directed network
  # - create a new undirected graph without loops
  # - extract the graph components and join with original shapefile
  
  network_links_df <- st_is_within_distance(network_links, 
                                            river_net_simplified,
                                            dist = 0.01) %>%
    lapply(.,
           FUN = function(x){data.frame("from" = x[1], "to" = x[2], "to2" = x[3]) }) %>%
    do.call(rbind,.) %>%
    cbind(network_links %>% st_drop_geometry())
  
  # Get a full distance matrix
  # - note: directions are messed up!
  # - note: for the joints I am creating triangles (i.e. there are 3 nodes and 3 links):
  # this is to be corrected afterwards when I decide the directionality
  full_net_links_df <- rbind(
    network_links_df %>%
      filter(!is.na(to2)) %>%
      dplyr::select(-to2) %>%
      rename(from = from, to = to),
    network_links_df %>%
      filter(!is.na(to2)) %>%
      dplyr::select(-to) %>%
      rename(from = from, to = to2),
    network_links_df %>%
      filter(!is.na(to2)) %>%
      dplyr::select(-from) %>%
      rename(from = to, to = to2) ,
    network_links_df %>%
      filter(is.na(to2)) %>%
      dplyr::select(-to2)
  ) %>%
    #dplyr::select(from, to, type, id_dam, pass_u, pass_d) %>%
    mutate(id_links = 1:nrow(.))
  
  # Create vertices vector
  vertices <- river_net_simplified %>% 
    st_drop_geometry %>%
    mutate(name = 1:nrow(.)) %>%
    mutate(length = as.numeric(length))
  
  # Create a graph based on this distance matrix and adjust directions
  # note: add vertex name so it's easier to identify them in the next steps
  river_temp_2 <- igraph::graph_from_data_frame(
    d = full_net_links_df,
    v = vertices,
    directed = FALSE) %>%
    igraph::simplify(remove.loops = FALSE, remove.multiple = TRUE, edge.attr.comb="first")
  
  # Get the components of river_temp
  river_net_simplified_comp <- river_net_simplified %>%
    mutate(NodeID = 1:nrow(.)) %>%
    dplyr::select(NodeID)
  river_net_simplified_comp$component <- as.vector(components(river_temp_2)$membership)
  
  # Return output
  return(river_net_simplified_comp)
  
}

References and key literature

Baldan, D., Cunillera-Montcusí, D., Funk, A., & Hein, T. (2022). ‘Riverconn’: An R package to assess river connectivity indices. Environmental Modelling & Software, 105470.

Lehner, B., Verdin, K., Jarvis, A. (2008): New global hydrography derived from spaceborne elevation data. Eos, Transactions, AGU, 89(10): 93-94.

Lehner, B., Grill G. (2013): Global river hydrography and network routing: baseline data and new approaches to study the world’s large river systems. Hydrological Processes, 27(15): 2171–2186. Data is available at www.hydrosheds.org.

Belletti, Barbara, et al. “More than one million barriers fragment Europe’s rivers.” Nature 588.7838 (2020): 436-441.

Verdin, K. L., & Verdin, J. P. (1999). A topological system for delineation and codification of the Earth’s river basins. Journal of Hydrology, 218(1-2), 1-12.

Cote, D., Kehler, D. G., Bourne, C., & Wiersma, Y. F. (2009). A new measure of longitudinal connectivity for stream networks. Landscape Ecology, 24(1), 101-113.

Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R (Vol. 65). New York: Springer.

Jumani, S., Deitch, M. J., Kaplan, D., Anderson, E. P., Krishnaswamy, J., Lecours, V., & Whiles, M. R. (2020). River fragmentation and flow alteration metrics: a review of methods and directions for future research. Environmental Research Letters.