`matos` for the power user

library(matos)

If you’re a data manager for a lab that partakes in fish telemetry, you are likely balancing a number of projects at any one time. It can be pretty hard to keep track of what’s new when an OTN data push occurs.

For example, I have 12 projects for which I am responsible.

projects <- list_my_projects()

head(projects)
#>                                             name number
#> 33      Maryland Department of Natural Resources     90
#> 46              Navy Kennebec ME Telemetry Array    192
#> 51         NCBO-MD DNR Chesapeake Backbone North    181
#> 52           NCBO-VMRC Chesapeake Backbone South    164
#> 118 UMCES Black Sea Bass & Offshore Construction     97
#> 119          UMCES BOEM Marine Mammal Monitoring    242
#>                                                 url
#> 33   https://matos.asascience.com/project/detail/90
#> 46  https://matos.asascience.com/project/detail/192
#> 51  https://matos.asascience.com/project/detail/181
#> 52  https://matos.asascience.com/project/detail/164
#> 118  https://matos.asascience.com/project/detail/97
#> 119 https://matos.asascience.com/project/detail/242

Parallel

I like to use the future family of packages to run things in parallel specifically, future.apply. When you get quite a few projects, this speeds up pulling your files from MATOS quite a bit.

Listing

library(future.apply)
plan(multisession)

# List files in all of my projects
extraction_files <- future_lapply(projects$number, 
                                  function(x){
                                    list_extract_files(x)
                                    }
                                  )

# Bind together into one data frame
extraction_files <- do.call(rbind, extraction_files)

head(extraction_files)
#>   project            file_type detection_type detection_year
#> 1      90 Data Extraction File        matched           2015
#> 2      90 Data Extraction File        matched           2016
#> 3      90 Data Extraction File        matched           2017
#> 4      90 Data Extraction File        matched           2018
#> 5      90 Data Extraction File        matched           2019
#> 6      90 Data Extraction File        matched           2020
#>   upload_date                            file_name
#> 1  2023-03-21 mddnr1nr_matched_detections_2015.zip
#> 2  2023-03-21 mddnr1nr_matched_detections_2016.zip
#> 3  2023-03-21 mddnr1nr_matched_detections_2017.zip
#> 4  2023-03-21 mddnr1nr_matched_detections_2018.zip
#> 5  2023-03-21 mddnr1nr_matched_detections_2019.zip
#> 6  2023-07-06 mddnr1nr_matched_detections_2020.zip
#>                                                                url
#> 1 https://matos.asascience.com/projectfile/downloadExtraction/90_1
#> 2 https://matos.asascience.com/projectfile/downloadExtraction/90_2
#> 3 https://matos.asascience.com/projectfile/downloadExtraction/90_3
#> 4 https://matos.asascience.com/projectfile/downloadExtraction/90_4
#> 5 https://matos.asascience.com/projectfile/downloadExtraction/90_5
#> 6 https://matos.asascience.com/projectfile/downloadExtraction/90_6

That’s 142 files of which I need to keep track! It really adds up.

Downloading

If we want to download all of those files, we can do something similar. We just need to change the function we’re running in parallel to get_extract_file and provide it the URLs from the list we made via list_extract_files. I’ll download the first three files for demonstration purposes.

future_lapply(extraction_files$url[1:3], 
              function(x){
                get_extract_file(url = x)
              }
)
#>    C:\Users\darpa2\Analysis\matos\vignettes\mddnr1nr_matched_detections_2015.zip 
#>    C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2015.csv
#>    C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt
#> 
#> ── Downloading files ──────────────────────────────────────────────
#> ✔ File(s) saved to:
#> 
#> ── Unzipping files ────────────────────────────────────────────────
#> ✔ File(s) unzipped to:
#>    C:\Users\darpa2\Analysis\matos\vignettes\mddnr1nr_matched_detections_2016.zip 
#>    C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2016.csv
#>    C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt
#> 
#> ── Downloading files ──────────────────────────────────────────────
#> ✔ File(s) saved to:
#> 
#> ── Unzipping files ────────────────────────────────────────────────
#> ✔ File(s) unzipped to:
#>    C:\Users\darpa2\Analysis\matos\vignettes\mddnr1nr_matched_detections_2017.zip 
#>    C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2017.csv
#>    C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt
#> 
#> ── Downloading files ──────────────────────────────────────────────
#> ✔ File(s) saved to:
#> 
#> ── Unzipping files ────────────────────────────────────────────────
#> ✔ File(s) unzipped to:
#> [[1]]
#> [1] "C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2015.csv"
#> [2] "C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt"                
#> 
#> [[2]]
#> [1] "C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2016.csv"
#> [2] "C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt"                
#> 
#> [[3]]
#> [1] "C:/Users/darpa2/Analysis/matos/vignettes/mddnr1nr_matched_detections_2017.csv"
#> [2] "C:/Users/darpa2/Analysis/matos/vignettes/data_description.txt"

Summarizing

We can do the same by looping through receiver and transmitter push summaries. For me, this will create 24 reports! Still a lot, but quite a bit easier to digest than millions of detections spread over 142 files.

future_lapply(projects$number[1:2], 
              function(x){
                matos_receiver_summary(x)
              }
)

future_lapply(projects$number[1:2], 
              function(x){
                matos_tag_summary(x)
              }
)

2023-22-02

Parallel

Listing

Downloading

Summarizing