Blog | Moja Global Community

Google Summer of Code 2022 - MLOps for Reproducible Science

September 11, 2022 · 16 min read

GSoC'22 Mentee

Some technical backgouround

Full Lands Integration Tool (FLINT) is an open-source software technology designed for measurement-reporting-verifying greenhouse gas emissions and removals from forestry, agriculture and other land uses. FLINT is not an MRV system but provides a framework to progressively build MRV systems for specific cases.

The Generic Carbon Budget Model (GCBM) is a tool developed to assess and report the cumulative effects of anthropogenic and natural disturbances on forests. The GCBM is a set of modules developed by the Canadian Forest Service (CFS) to run on top of FLINT. This set of modules describes several forest carbon pools. GCBM simulations work at an annual time step interval. GCBM handles as inputs a combination of spatially explicit datasets concerning the forest (with information like tree species or the location of forest types), age, climate, and disturbances, along with non-spatial parameters such as volume to biomass conversion coefficients or yield curves. A python tool is used to prepare the spatial inputs from raster to vector format and an SQLite Database handles the non-spatial data [1].

_{*carbon pool: represents a reservoir of carbon that can be stored or released , **disturbances: unplanned events (e.g. wildfire) that affect carbon pools}

Project Description 📌

We applied MLOps techniques and tools to a complex scientific workflow s part of a community led, distributed carbon modelling platform. These techniques ensure reproducibility, which is the cornerstone of good science.

In my proposal, the main concept is to automate complicated and technically demanding tasks, as well as the corresponding reports, that a moja-global contributor might face by using CI/CD tools for Machine Learning/Data Science projects. I propose a DVC repository (i.e.remote storage) to cache significant logs and outputs of simulations which is achieved with the DVC pipeline for GCBM.Belize and Colombia repositories. Furthermore, I build a cloud storage repository on which we can use DVC's and CML's features to track, cache, compare, store and make flint-ready datasets there, thus making the datasets reproducible and interactive with the git repository that processes them because spatial datasets have a lot of variability and a standardisation process is needed to achieve reproducibility. I believe I achieved that with the work I've done on the Land Sector repository. Last but not least, I created a CML Action that generates a small summary/report from the execution of a simulation for the FLINTcloud platform as a numerical integration test, preventing breaking changes while under active development, and providing a useful template for new users to deploy their own FLINT services.

_{** flint-ready: convert spatial datasets in into to a common format and coordinate system}

Tech Stacks Used

Purpose	Tools and Technologies used
CML Action in FLINT.Cloud	CML, GitHubActions, Docker
DVC pipeline in GCBM.Belize	DVC, Python, R
Land Sector Dataset Processor	DVC, CML, geopandas, gdal, GitHubActions

Mentors

I owe huge thanks to my mentors Andrew O'Reilly-Nugent and Simple Shell for supporting me throughout the whole program and providing the appropriate feedback to me so I can improve my contributions.

Fellow contributors

Also huge thanks to community members Namya, Padmaja, Harsh, and all the other contributors that I worked with for not only providing support on technical issues but welcoming me to the community.

What do I believe I achieved in a general aspect?

In my proposal, my ideas and deliverable items were described generically and indefinitely because I hadn't realized 100% how several aspects of the code base worked. After I conducted enough research and learned how the code works I believed I reformed my deliverables more realistically.

DVC pipelines on the GCBM module

GCBM.Belize PR link: https://github.com/moja-global/GCBM.Belize/pull/14

GCBM.Belize repository

GCBM.Belize was developed as a case study for the GCBM in Belize. The repository can be seen as a paradigm on how the GCBM works in nation-scale, but also provides a technical paradigm for new-coming contributors on how the GCBM works, particularly:

How the datasets are prepared
How the simulation runs
How to interact with the results (analysis and reports)

Some existing issues:

The whole workflow was configured to run in Windows Batch scripts and the steps were not connected between them, nor do they follow any structure, making their execution and the analysis of the results more complicated.
The Belize repository is a good case for experimenting with the GCBM by modifying specific parameters of the simulation and comparing the different outputs they provide. There was no such procedure to do that.

A little about DVC

Data Version Control (DVC) is a dataset versioning tool that takes advantage of already existing engineering tools like Git. So we can't say that DVC is git for handling large-sized datasets that can be managed by git due to size limits.

What was the goal?

Integrate a system-agnostic pipeline to execute the complete workflow (preprocessing, simulation and postprocessing) that tracks and stores the outputs in remote storage.

What does the pipeline offer to the user?

The cases of GCBM.Belize and GCBM.Colombia are developed to only be executed in Windows systems but using DVC's functionalities I set up the pipeline to be system-agnostic.
Before the DVC pipeline was established, the phases of the workflow had to be executed manually but using DVC anyone can execute the whole workflow with only one command (dvc repro)
In the cases of the GCBM module, the outputs it generates are for the most part .tif files and in the postprocessing step, it generates some metrics and plots. All these outputs are listed and tracked by DVC using md5 hashes. I used DVC's functionality to store files in remote storage and set up a Google Drive repository that stores them.
Furthermore, I used DVC's features for metrics and plot files to track these kinds of outputs from the workflow. This way when someone created another version of the GCBM, the dvc diff command could be used to compare the metrics from the standard GCBM version to the new one (and use the output in a potential report as well).
In the scenario that someone has created and executed different versions of the established pipeline the dvc exp tool can be used to list, compare and display metrics, plots, and the output tifs (and use these outputs in a potential report as well).

How does the DVC pipeline works?

Pipelines in DVC are developed in YAML syntax and are divided into stages. For each stage to be appropriately executed we need to define:

the working directory
the command to be executed
the stage's dependencies (i.e. the files that are used/affected by the stage)
the stage's outputs(optional)

The GCBM pipeline consists of 12 stages :

Tiler → Processes and defines the spatial layers to be used in the sim
Recliner2GCBM → generates the gcbm_input database
modify_<type>_parameters scripts that apply preprocessing to the input_database to fit better into specific GCBM cases
UpdateGCBMConfiguration → updates the simulation configuration based on the contents of the Standalone_GCBM/template directory
run_gcbm
create_tiffs → generates the compiled spatial output
compile_results → generates the output database, a more user-friendly format to show the output results
post_processing → which creates the 3 figures that showcase the distribution of the four unique indicators throughout the simulation and metrics files that calculate the mean values of 3 different metrics of carbon stock divided between 3 periods (1 period = 50 years) for every type of indicator and every type of LifeZone.

Another important fact is that DVC by default does not assign any specific order to the execution of the pipeline. Although someone can define a specific order by denoting the outputs of each i-th stage as dependencies in i+1-th stage.

The pipeline offers the ability to professionals outside the programming world to effortlessly interact with simulations and modules without having to dig down on coding or other difficult technical tasks so they can focus more on analyzing the results.

How it can be utilized in the future?

DVC not only provides a blueprint to organize the workflow but also offers the ability to determine the dependencies and outputs of each step of the workflow. Because in most of the cases that are studied and use either FLINT.core or other modules the stages of the workflow are somewhat familiar(create a simulation and its configs, run it, compile the results and run some postprocessing scripts) this pipeline can be used as a blueprint/guide (with some modifications) to integrate the same functionalities in other FLINT or GCBM cases.

Usage examples

You can list the stages and each stage's dependencies by typing:

$ dvc stage list
tiler                       Outputs ..\..\logs\tiler_log.txt
recliner2gcbm_x64           Outputs logs\recliner_log.txt
add_species_vol_to_bio      Outputs logs\add_species_vol_to_bio.log
modify_root_parameters      Outputs logs\modify_root_parameters.log
modify_decay_parameters     Outputs logs\modify_decay_parameters.log
modify_turnover_parameters  Outputs logs\modify_turnover_parameters.log
modify_spinup_parameters    Outputs logs\modify_spinup_parameters.log
update_GCBM_configuration   Outputs ..\logs\update_gcbm_config.log
run_gcbm                    Outputs ..\logs\Moja_Debug.log
create_tiffs                Outputs ..\..\logs\create_tiffs.log, ..\..\processed_output\spatial
compile_results             Outputs ..\..\logs\compile_results.log
post_processing             Reports metrics\1900-1950_Deadwood_Tropical_Dry.json, metrics\1900-1950_Deadwoo…

You can display the metrics by typing:

$ dvc metrics show
Path                                                          area_sum_mean    pool_tc_per_ha_mean    pool_tc_sum_mean
metrics\1900-1950_Deadwood_Tropical_Dry.json                  1142790.3823     8.46574                9674571.56838   
metrics\1900-1950_Deadwood_Tropical_Moist.json                608498.40961     19.67003               11969184.86104  
metrics\1900-1950_Deadwood_Tropical_Premontane_Wet.json       417245.76511     23.20835               9683586.49465   
metrics\1900-1950_Litter_Tropical_Dry.json                    1142790.3823     7.63421                8724307.19108   
metrics\1900-1950_Litter_Tropical_Moist.json                  608498.40961     15.89514               9672169.16877   
metrics\1900-1950_Litter_Tropical_Premontane_Wet.json         417245.76511     19.54048               8153183.07648   
metrics\1900-1950_Soil Carbon_Tropical_Dry.json               1142790.3823     18.18886               20786056.03202  
metrics\1900-1950_Soil Carbon_Tropical_Moist.json             608498.40961     69.2994                42168571.67542  
metrics\1900-1950_Soil Carbon_Tropical_Premontane_Wet.json    417245.76511     73.05651               30482518.25375    
...

You can test the pipeline by typing:

$ dvc repro # or 
$ dvc exp run

You can push the outputs the pipeline generated by typing:

dvc push

For testing purposes, I used a personal Google Drive storage to upload the outputs here
Here you can see a video showcase for the whole execution of the pipeline only with one command
There is a guide/description that explains how to set up, configure and run DVC and the pipeline with more technical details here

Processing the Land Sector Datasets

PR Link: https://github.com/radistoubalidis/Land_Sector_Datasets/pull/3

The Land Sector Datasets repository

The Land Sector Datasets repository consists of datasets and their metadata for Land Sector management use in the FLINT. In more detail, this repository includes Jupyter Notebooks that contain the license, metadata along with other information, and the processing code to get the datasets (which are in raster format) into vector format (.tif or geoJSON) to be flint-ready.

_{** flint-ready: prepare the dataset in vector format}

What's the issue?

There is also a Google Drive remote storage maintained by moja-global's contributors, where the processed (vector format) datasets are stored. Although the Land Sector git repository and the Google Drive storage are not directly connected, meaning there's no specific procedure that ensures the reproducibility of those datasets, contributors can only use one version of each one.

What's the goal?

The goal here is to establish a framework that guarantees reproducibility while curating the datasets and also decrease the required dependencies to process the datasets, so in general, make the Google Drive storage directly connected and interactive with the GitHub repository.

What was my approach?

My first thoughts were that this case is very well suited for utilizing DVC's features to track datasets through Google Drive but that didn't seem enough. After discussion with the community and analyzing the situation I pinned down the issue into these finite tasks:

Implementing a general processor capable of processing the datasets into flint-ready formats
Track the processing steps (using DVC) and store the flint-ready datasets in the Google Drive storage
Provide a health check on the processed datasets when someone makes changes to the processing steps

Since the repository consists of many different datasets my mentors suggested I use the Harmonized World Soil Database dataset as a use case. My first task was to implement processing code for the dataset from raster into vector (.tif) format. I achieved that by writing a script that utilized the gdal python library to convert the raster dataset into vector format and restructure it to be flint-ready. Then, I created a DVC pipeline that runs the processing script and lists as output the vector dataset. At this point, I have to note that the pipeline is essential for future reference where we potentially need to implement processing for multiple datasets and push them in remote storage only with one command. The third and final step was to create a GitHub action that executes the pipeline and pushes the output datasets to Google Drive storage. This action is going to be triggered only when someone’s commit changed the dataset processing code thus providing a health-check (i.e. that someone's changes in the processing code generated a reproducible and flint-ready dataset). For a better understanding of how these deliverables will work here is a workflow example of a potential contributor:

Let's say I want to use the Harmonized World Soil Database dataset, which is stored in GeoTiff format in moja-global's Google Drive storage, but I want to use it in GeoJSON format, so I make some changes to the processing script and commit them.
The health-check action is triggered and executes the pipeline
After the action is completed the processed dataset is pushed to the Google Drive storage and an auto-generated commend (using CML) is published (example here) that informs me whether my script generated a flint-ready dataset or not.

Notes

Most of the datasets in the repository are processed using arcpy which is a python package for geographical analysis but can only be used in Windows systems. As suggested by my mentors I also worked on refactoring the processing code of some Datasets that use arcpy with system-agnostic libraries/modules such as geopandas. I applied this idea in the Global Ecological Zones dataset where I processed the dataset with geopandas instead of arcpy, so now the datasets can be processed in non-Windows systems too.
I believe that if we follow an OOP approach we can implement a general processor that would be able to handle multiple Datasets from the Land Sector repository so it can be used in other moja-global projects in the future.
As this deliverable is not merged yet I used a personal Google Drive storage for testing my code but the same principles can be applied to moja-global's Google Drive storage.

CML Action on FLINT.Cloud

PR Link: https://github.com/moja-global/FLINT.Cloud/pull/132

FLINT.Cloud Repository

The core goal in FLINT.Cloud is to build a continuous deployment pipeline to offer FLINT on cloud resources. It consists of 2 unique APIs that run a different kinds of simulations. The APIs in FLINT.Cloud are configured inside docker containers where the required dependencies such as the FLINT source code, other required software packages, etc. are included. The repository is going to co-operate with other repositories because the community is also working on FLINT.UI which is going to be a FLINT frontend client for configuring simulations using the FLINT.Cloud APIs. Regarding the FLINT.Cloud repository, every step of the workflow (creating simulation configs, running and analyzing results) is executed through manual requests to the APIs. So, after discussion with the community and mentors, the creation of a CI script that would run a simulation and auto-generate a report was suggested. This CI script would provide a blueprint on how the APIs work to new-coming contributors and also stand as a procedure that automates a part of the workflow, thus releasing the developer/researcher of some technically demanding tasks.

A little about CML

CML is an open-source CI/CD tool for Machine Learning or general Data Science projects. It can be used to track and provide auto-generated reports on development workflows and also can be configured to provide CI pipelines on cloud-hosted runners.

What was the original goal?

The original goal was to establish a GitHub Action that executes a benchmark simulation in FLINT.Cloud and provides an automated report on the sim results using CML

How does it work?

First of all, the action is triggered on pull requests with the simulation label. The action uses a benchmark sim configuration and runs it using the rest_gcbm_api which is wrapped inside a docker container that includes all the required dependencies for the simulation along with the GCBM rest-API. Using the provided benchmark configuration we execute it. After the simulation ends we use the CompileResults repo to prepare the compiled_gcbm_output database where SQL queries are run to provide information on the simulation's output. These inferences and plots can be utilized by CML to be published in the form of a comment on the PR.

What does the Action achieve?

Auto-generated simulation report when someone raises a PR with the simulation label. The generated report currently looks like this
Establishes a validation process that the changes made in the specific PR will not break anything regarding the simulation run.

How and why we had to modify the Action

The FLINT.Cloud repository is still in the development phase. Since the GCBM rest-API is still in the development phase and daily continuous changes in the input configuration code are made, we stumbled upon some errors that couldn't be resolved so it was decided to temporarily modify the CML Action to run the simulation straight from the FLINT CLI interface (which is included in the container) until the APIs get their final form.

How it can be enriched?

After the CML Action PR was merged mentors suggested we could enrich the auto-generated report by also displaying (in Jupyter Notebook form) the code that generated any potential plots. They suggested the use of jupytext, which is a python package for versioning and managing Jupyter Notebooks. It offers a variety of commands that map python scripts, markdown texts, and notebooks, so you can instantly access code in any of these formats. To sum up, I used jupytext to convert the script that generates the report from py:percent format into markdown so it can be attached to the report. I also raised a new PR for this addition.

Final Thoughts

My experience throughout the mentorship has been wonderful. I believe I learned a variety of things, from how to implement CI principles in non-traditional web projects to how to write cooperative code. I feel proud for being a part of a community with such team spirit, and I want to continue collaborating and making real contributions to moja-global.

References

[1] Shaw, C. H., et al. "Cumulative effects of natural and anthropogenic disturbances on the forest carbon balance in the oil sands region of Alberta, Canada; a pilot study (1985–2012)." Carbon Balance and Management 16.1 (2021): 1-18.

Google Summer of Code 2022 - Building UI Library for Moja Global

September 9, 2022 · 4 min read

Palak Sharma

GSoC'22 Intern

| Project: Building UI library for moja global

Project Details	Useful Links
Mentor	Harsh Mishra, Gopinath Balakrishnan
Student	Palak Sharma
Github Repository for the project	GitHub
Npm Package	NPM
Documentation	Storybook

Project Description✍

Moja global is a not-for-profit, collaborative project that brings together a community of experts to develop open-source software that allows users to accurately and affordably estimate greenhouse gas emissions and removals from the AFOLU sector.

This year in Google Summer of Code 2022, I have worked on building a faster, lightweight, user-friendly, and cross-browser-compatible UI library as a stand-alone dependency for moja global projects. I have worked on building a UI library having pre-written or ready-made code for various widely used components across all the repositories of moja global that future developers and users can use in their projects just by installing the package and running two or three import commands😀.

The Moja global User-interface (UI) library developed under this project serves the following purpose🎉

It aims to bring forward an intuitive, consistent, and easy-to-use interface that can help our developers within the User-Interface working group and users to quickly accomplish their tasks.
It helps us mitigate popular UI-development issues like inconsistent user experience, performance issues, accessibility requirements, and more.
It improves the design & development workflow and meets the acceptable web accessibility requirements.
Using UI Library components is easy as most of the code is written in the documentation itself.

My Contributions🎯

The main aspects of my contributions are🎓

Contribution	Tools and Technologies used
Migration of current projects from Vue2 to Vue3	Vuejs
Designing	Figma
Implementation	Vuejs
Documentation	Storybook
Building	Rollup
Testing	Axe
Launch	Yarn
Integration of UI library into existing projects	UI library

Migration of existing project from Vue2 to Vue3 🔁

Initially, I worked on migrating the existing project from Vue version 2 to Vue version 3, to make it compatible to use the UI library which is based on Vue version 3

These are the Pull requests, I made for the migration of the existing project

Designing🌸

After Migration, I worked on designing the components of the UI library according to the Moja Global brand guidelines. The designs can be found here

Implementation & Documentation📝

I did the Implementation and Documentation of the UI library simultaneously, I have added various components in the UI library with each being fully customizable, and accessible.

These are the Pull requests, I made for the Implementation and Documentation of the UI library

Building👩‍💻

Building the UI library is the main task of the project. I used vue rollup to build the UI library.

Testing🔍

For testing the components of the UI library, I have given two options to test in development mode as well as while making the pull request

Launch🚀

With the help of my mentor, the launch of the UI library has been automated. The launch happens as soon as the package version in package.json is changed.

Integration of UI library into existing projects🎊

We tested the UI library by integrating it in the fresh new Vue 3 project as well as in the existing moja global's project. It worked as expected in both. It is currently in use in FLINT UI

Summary👩‍🎓

The overall experience of GSoC has been amazing and full of learning. I learn many things like how to build a project from scratch, how to work with large codebase, etc. I would like to thank my mentors for always being there and I have learned a lot from them during the GSoC period and before🎀. I like the culture of moja global very much. All the mentors here are very friendly and very supportive. I learned the way open source works and the etiquette which should be followed while working on a real-life based project. I want to continue contributing to moja global in the future and wish to see moja global reaching heights of success :)

GSoC: GCBM Simulation Editor - Status Report

September 9, 2022 · 4 min read

Yash Kandalkar

GSoC'22 Intern

Hello everyone! Hope you're doing well! This is the final blog of my GSoC journey. It'll contain a brief of everything that I have worked on during my term and what's left to do.

Mentors

A huge thanks to Andrew O'Reilly-Nugent and Harsh Mishra for helping me at every step of the project. I cannot thank them enough for encouraging me throughout the project and guiding me in the right direction.

Community Members

Special thanks to all these community members for helping me throughout the project: Padmaja, Namya, Shloka, Sanjay, Janvi, and Palak. I wouldn't have been able to finish the project without their help.

Community Bonding Period

I started working on the project by researching on the different inputs required for running the GCBM Simulation. I gathered information from Andrew, Padmaja and Namya regarding the configs which can be edited by the users on the web interface. During this period, we also decided to migrate the project from Vue 2 to Vue 3 as it is the latest version of Vue and supports more libraries. I worked on the migration process and updated all the old libraries to their Vue 3 compatible versions and updated some legacy code. I also started working on the new components and removing the old ones in the GCBM Simulation Editor.

Local Domain Configuration UI

Relevant PRs:

Week 1:

From the last week of community bonding period I started creating components of each configuration parameter. In this week, I created the UI for the Modules configuration. The users can enable and disable (add or remove) modules that'll be included in the run. Some modules, like the Decay Module, also contains configurable variables.

Modules

Relevant PRs:

Week 2:

In this week I worked on creating the Pools config. Users can edit different pool values and search the name of the pools in a search bar.

Pools

Search bar Pools

Relevant PRs: feat: add Pools config in GCBM run #325

Week 3:

In this week, I added a Create Simulation page which will be the entry point of the simulation run and added sub-menus in the Upload Section for different file types (disturbances, classifiers, input DB and miscellaneous).

Create Simulation UI

Relevant PRs:

chore: redid the file and menu structure #331

Week 4:

I started working on the UI for the Upload section. Here, the major work was of the JSON config editor. There was a requirement for a GUI editor to make it easier for non-developers to edit the JSON attributes easier. I used the vue3-json-editor library for this functionality.

Upload section

Relevant PRs: https://github.com/moja-global/FLINT-UI/pull/332

Week 5:

I added the UI for editing column names in the DB Editor. After the user uploads a database file, the server responds with the table and attribute names. The requirement was to provide users the option to edit table and attribute names on the frontend. I added a Edit button near the database table which when pressed, makes all the attribute names editable.

database editor

Relevant PRs:

feat: add UI for Spinup, Run simulation, DB Editor, Integrate upload endpoints #347

Week 6:

Connected some configuration like Local Domain with the Vuex store. I added vue-persist so that the configurations made on the frontend will persist between website reloads.

Relevant PRs:

feat: add UI for Import Simulation modal, connect LocalDomain to the vuex store #355

Week 7:

Added Import Simulation feature in the sidebar. Here, users can upload all the files required by the simulation. If configuration files are uploaded, they will be read using the FileReader API and the UI will be updated accordingly.

import simulation

Week 8:

Added the UI and functionality for changing table names in the db editor.

change table names

Week 9:

Worked on adding an Export Simulation feature so that the users can download the configurations in JSON format if they want to continue configuring the simulation later. These JSON configuration files can be uploaded in the Import Simulation feature, which will update the UI accordingly.

Week 10:

In this week, I worked on creating a Tour for the simulation, which will help new users understand the flow of the simulation editor.

Further Steps:

In the coming week, I will be writing tests for the components I created and documenting the features which will help new contributors understand the code and continue enhancing the project.

Previous Blogs:

Progress so far - 2022 GSoC mentees

August 13, 2022 · 5 min read

Namya LG

GSoD'22 Intern

Yash Kandalkar, Radis Toubalidis and Palak Sharma are GSoC mentees for the year 2022.

Yash is working on building the GCBM Simulation Editor with the motive of simplifying the process of running the GCBM simulation and allowing users to configure the inputs supplied. Running a simulation broadly consists of - creating a new simulation, uploading inputs, running the simulation and downloading the output on completion. FLINT UI is the frontend interface, while the backend is powered by FLINT.Cloud APIs. For each of the above mentioned steps, there are API endpoints. In the initial phase, Yash migrated all the libraries used in the FLINT UI project to their Vue 3 compatible versions. On completion he started developing the UI components for creating a new simulation and uploading input files. Currently, the inputs are - classifiers, input database and disturbances (not mandatory). Classifiers and Disturbances are supplied as tiff files, the input database is a SQL database. The JSON associated with the tiff files are generated at the backend. Yash worked on making UI components for configuring parameters like Local Domain, Modules, Pools, etc. present in the generated JSON files. The next step is to complete the UI development and API integration for the further stages of the simulation.

Radis is working on the project MLOps for reproducible science.The goal of this project is to streamline the workflow of data scientists on the FLINT.Cloud project by leveraging the combined potential of Data version control (DVC) and Continuous Machine Learning (CML). He is working on integrating a CML Action for FLINT.Cloud that runs the whole simulation workflow. The action runs whenever a new pull request is raised to the FLINT.Cloud repository with specific labels (i.e. run-simulation). The action helps to report if a particular simulation was complete. After a simulation ends, a python script retrieves the log files created from the moja.cli tool and uploads them on git as an artifact. A python script included in the Compile Results repository generates the output database, compiled_gcbm_output.db, on which SQL queries are run to provide information on the simulation's output. Along with that, plots and visualisations can be created. These inferences and plots will be published as a comment to the pull request. Radis is also building a DVC Pipeline for GCBM.Belize that divides each step of the workflow. In this case, DVC not only provides a blueprint to organize the workflow but also offers the ability to determine the dependencies and outputs of each step of the workflow.

The DVC pipeline for GCBM.Belize consists of 12 stages. In each stage, the following information is to be defined :

Command that is going to be run in a particular stage
Working directory from which the command will run
Dependencies of a stage (i.e. the files that are affected from the command )
Outputs of the command which can be any file that is created but also DVC gives us the ability to define outputs as metrics files (json format) or plots files.On every stage of the pipeline DVC tracks the dependencies and outputs using md5 hashes (dvc.lock). There is also an option to setup remote storage (e.g. Google Drive, AWS, etc) and store the outputs of each stage there.

The main stages are :

tiler → defines the spatial layers needed for the simulation
recliner2gcbm → creates the input database

(3,4,5,6,7) → add_species_vol_to_bio , modify_root_parameters, modify_decay_parameters, modify_turnover_parameters, modify_spinup_parameters, apply preprocessing to the input_database to fit better for the Belize case.

update_GCBM_configuration → updates the simulation configuration based on the contents of the /Standalone_GCBM/template
run_gcbm → runs the simulation
create_tiffs → generates the compiled spatial output
compile_results → generates the output database, a more user-friendly format to show the output results
post_processing → which creates the 3 figures that showcase the distribution of the four unique indicators throughout the simulation as well as with the different configurations (i.e. with the default parameters or the modified parameters. It also generates the metrics files which calculate the mean values of 3 different metrics of carbon stock divided in 3 periods (1 period = 50 years) for every type of indicator and every type of LifeZone.

Palak Sharma is working on the project Building UI Library for Moja Global with the motive of creating an intuitive, consistent, and easy-to-use interface that can help developers within the User-Interface working group and users to quickly accomplish their tasks. A centralized collection of components encompassing the color, branding of moja global, typography, spacing, buttons, modals and form which will help establish a unified and consistent design language to help contributors and users. A new repository has been created for the UI library. After creating prototypes of the UI library on Figma, Palak started working on the implementation. It was decided that pure CSS will be used for building the UI library. The existing codebase in the FLINT UI repository was migrated from Vue JS version 2 to version 3. She worked on adding the Storybook Setup in the official UI Library repository to demonstrate the components better and also to document the code for using the UI Library. To make it easy for new developers, the usage and functionalities of the UI library will be documented. Palak and other contributors have added fully customizable Dropdown, Alert, Button, Card, Datepicker, and Sponsors components successfully to the project and other components like Modal, Footer, Navbar, Toggle, Slider, and Accordion components are under review.

Progress so far - 2022 GSoD interns

August 2, 2022 · 2 min read

Namya LG

GSoD'22 Intern

Amarachi Iheanacho and Namya LG are GSoD interns for the year 2022.

Amarachi was offered the role of "Technical Writer 2". Her tasks mainly entail creating the Full Lands INtegration Tool (FLINT) handbook, credible case studies on both FLINT and the Generic Carbon Budget Model (GCBM), and simplifying the documentation for the "Understanding FLINT" section of the official moja global documentation. The FLINT Handbook will serve as a guide and provide extensive detail about the FLINT. In the initial weeks, Amarachi studied the GCBM and the FLINT and created a broad outline of the contents of the Handbook. She wrote a case study on how the GCBM can be used to report and assess the effects of anthropogenic activities on forests. Further, Amarachi worked on the chapters of the Handbook. Chapter 1 handles fundamental concepts like stocks, flows, states and transitions. This chapter also introduces the reader to organizations that significantly impact global decisions on the war against climate change. Chapter 2 discusses what FLINT is and, more importantly, the different modules FLINT uses to describe processes that drive carbon changes in a landscape. Chapter 3 discusses GCBM, the data required for the GCBM to simulate carbon flows, and, more importantly, how to conclude from the output that the GCBM gifts us. In the coming months, she will work on finishing the FLINT handbook, creating credible documentation for the understanding FLINT section of the moja global official documentation, and finally, case studies on climate change mitigation.

Namya was offered the role of "Technical Writer 1". The GCBM or (Generic Carbon Budget Model) is written in C++ and was not entirely documented. She started by creating a documentation website for GCBM based on Doxygen, explaining the different modules and functions of the GCBM. In order to obtain feedback for the moja global technical documentation from the users, Namya integrated a feedback funnel into the existing Sphinx website, and used Google Sheets as the backend. She has completed a case study on FLINT.Cloud and its significance in the ecosystem. She is currently revamping the documentation and building Sphinx documentation websites for the projects GCBM.Belize and GCBM.Colombia. Namya is also involved in community-building and coordinating the community meetings every week.

Progress so far - 2022 Summer LFX mentees

July 29, 2022 · 2 min read

Namya LG

GSoD'22 Intern

Janvi Thakkar and Sanjay Singh Rajpoot are LFX mentees working on the project Cloud Native Measurement, reporting, and validation of carbon emissions. They are currently contributing to the FLINT.Cloud and FLINT.UI Projects.

FLINT UI provides a user-friendly interface to run simulations where Janvi has added the following components to the UI-Libary :

Card
Modal
Toast-alert
Box-alert
Animation effects of Accordion
Date picker (in progress)

Janvi is also contributing to developing the UI library. To further enhance the flexibility of deploying the FLINT.Cloud project, she is working on deploying the FLINT and GCBM examples on Kubernetes both on a local and production level. The Azure Kubernetes Service will be used at the production level.

Sanjay started his term working on the FLINT.Cloud. His first task was to implement the CI/CD pipeline for the FLINT and GCBM examples, which required the creation of a new container image on the Azure Container Registry, every time code was pushed to the master branch of the FLINT.Cloud repository. To simplify the production setup further, Sanjay worked in association with Harsh, his mentor to write and test Terraform scripts (Terraform is Infrastructure As A Service, i.e resources required for performing tasks are provisioned by means of writing scripts) that can be used to deploy FLINT.Cloud on Azure. The GCBM example provides REST APIs to run a simulation. Earlier it supported only a static configuration. In the previous LFX term, Padmaja worked on a boilerplate template to accommodate dynamic inputs. This work is being carried forward further by Sanjay, who is working on developing API endpoints and the user interface

How AFOLU sector is responsible for GHG emission and removal?

July 25, 2022 · 4 min read

Sukriti Sood

Moja global Contributor

Agriculture, Forestry, and Other Land Use (AFOLU) plays a central role for food security and sustainable development . Plants take up carbon dioxide from the atmosphere and nitrogen from the soil when they grow, redistributing it among multiple pools, including above and below-ground living biomass, dead residues, and soil organic matter.

Pool is a system which has the capacity to accumulate or release carbon.

Aboveground Biomass:- Above ground standing dry mass of leaves and dead matter from the tree or shrubs or live worms expressed as mass per unit area.

Belowground Biomass:- It corresponds to woody (axial roots) and non-woody (fine roots) tissues present in the tree root system.

The CO2 and other greenhouse gases released by plants include methane and nitrous oxide , as well as the decomposition of dead plants and soil organic matter. Anthropogenic activities (e.g., management of croplands, forests, grasslands, wetlands) and changes to land use/cover (e.g., conversion of forest to cropland, pasture, afforestation) alter these natural processes. The activities of AFOLU lead to both sources of CO2 (e.g., deforestation, peat land drainage) and sinks of CO2 (e.g., afforestation, management for soil, carbon sequestration), as well as nonCO2 emissions predominantly from agriculture (e.g., CH4 from livestock and rice cultivation, N2O from manure storage and agricultural soils and biomass burning).

Carbon sequestration refers to the process of capturing emissions of atmospheric carbon dioxide and storing them. It is one way of reducing carbon dioxide concentration in the atmosphere in order to slow global climate change.

**Historical trends in AFOLU, 1990-2010: Decadal averages**

ghg emissions by economic sector in 2019

The key greenhouse gases of concern are CO2, N2O and CH4. CO2 fluxes between the atmosphere and ecosystems are primarily controlled by uptake through plant photosynthesis and releases via respiration, decomposition and combustion of organic matter.

**The main greenhouse gas emission sources/removals and processes in managed ecosystems.**

Emission and Removal Processes

Greenhouse gas fluxes in the AFOLU Sector can be estimated in two ways:

Finding net changes in C stocks over time (used for most CO2 fluxes)
Directly finding gas flux rates to and from the atmosphere (used for estimating non-CO2 emissions and some CO2 emissions and removals).

Below are the major ecosystem stocks and processes associated with emission and removal in the AFOLU Sector, grouped by components, i.e.,

Biomass
Dead Organic Matter
Soils
Livestock

Biomass

Plant biomass, including above- and below-ground components, is the main mechanism for removing CO2 from the atmosphere. The exchange of CO2 between the atmosphere and terrestrial ecosystems is largely controlled by photosynthesis and respiration.

Dead Organic Matter

The majority of biomass generated in living plant material ends up in dead organic matter (DOM) pools (dead wood, for instance). DOM sometimes decomposes rapidly, releasing carbon into the atmosphere. However, a substantial amount of carbon is retained for months, years, or even decades. Through the decomposition rate and input of fresh organic matter, land use and management affect C stocks of dead organic matter. The burning of dead organic matter results in the release of CO2, N2O, CH4, NOx, NMVOC, and CO.

Soils

As organic matter decomposes, it is transformed into soil organic matter. In soil, organic matter is composed of a variety of compounds that reside for varying periods. The majority of this material is biodegradable, and much of it can be easily decomposed by microorganisms, resulting in the return of carbon to the atmosphere. However, some of the soil organic carbon is transformed into difficult-to-decompose compounds (e.g., organic-mineral complexes) that are very slowly decomposed and can last for decades or generations in the soil.

Livestock

Animal production systems, particularly those with ruminant(cattle, sheep, goats, buffalo, deer, elk, giraffes and camels) animals, can be significant sources of greenhouse gas emissions. For example, enteric fermentation in the digestive systems of ruminants leads to production and emission of CH4. CH4 and N2O are formed from decomposing manures as a by-product of methanogenesis and nitrification/denitrification, respectively. NH3 and NOx are also lost by volatilization from manure management systems and soils, which results in indirect greenhouse gas emissions.

Progress so far - May 2022 Outreachy interns

July 23, 2022 · 2 min read

Namya LG

GSoD'22 Intern

Kateryna Derhaus and Ankita Sharma are May 2022 Outreachy interns.

The Generic Carbon budget model (GCBM) is built on top of the Full Lands Integration Tool (FLINT). It was developed by Canada to run simulations in the Canadian ecosystem. Adaptations of the GCBM include GCBM.Belize, GCBM.Colombia and GCBM.Chile. Kateryna Derhaus is working on developing a new simulation model for Carpathians. She has run a test simulation with low resolution spatial data and spatial data from Belize. On completion of the Carpathians simulation model, she will document the steps in jupyter notebooks. The Recliner2GCBM tool is used to create the GCBM input database. She is working on enhancing the recliner2GCBM functionality in Python to make the example more accessible to non-windows users.

Ankita started working on an analysis for Tropical Dry Forests in India and compared how the analysis can help to form a hypothesis about the soil type, and crops grown in the area. The analysis came with a dataset loading API that can be used to port to another type of forest and help make the analysis implementation more intuitive. To enhance the analysis on the result of the GCBM.Belize simulation, she has developed scripts to plot NEP (Net Ecosystem Production), NPP (Net primary productivity), NBP (Net Biome Productivity) and Rh (Relative Humidity). As FLINT.Cloud did not ship with an AppImage, Ankita helped create the AppImage script as well as AppImages for Ubuntu 22.04, 20.04, and 18.04 systems. The scripts have been deployed to the CI. In association with other mentees, Ankita is also working on the dynamic GCBM Pre-Processing API. This is a revision of existing API which will be more robust, and easy to understand or tweak based on requirements.

Meet moja global May 2022 Outreachy interns

May 29, 2022 · 3 min read

Harsh Mishra

Community Manager

Moja global is excited to announce the selection of two Outreachy interns working to support our mission in developing open-source Monitoring, Reporting and Verifying (MRV) software for forestry, agriculture and other land use (AFOLU) sectors.

Outreachy provides open-source internships to underrepresented groups in the technology industry. Through Outreachy, we can provide a platform for our interns to learn about our open-source software and gain experience in the industry while making viable contributions to the open-source community.

Project

Moja global participated in the May 2022 Outreachy internships with the project Research data sources for carbon sequestration in forests. The project aims to develop a pilot project to monitor the effect of forest management practices around the world while supporting forest owners, stakeholders, and policymakers. The interns will use various forest management indicators and a representative sample of the ecosystem and climatic conditions to develop an open-source workflow for assessing ecological conditions in forests around the world.

They will also be developing a semi-automated, FLINT-based workflow to run a unique FLINT simulation of forest dynamics for each location. The interns will also be responsible for publishing their findings openly on our repositories, with editorial support from the moja global Technical Steering Committee.

Based on the work developed and the other contributions, we selected two interns: Kateryna Derhaus and Ankita Sharma.

Outreachy Intern #1: Kateryna Derhaus

Picture of Kateryna. Kateryna is the Outreachy 2021 Intern from moja global

Kateryna is a Computer Engineering major at Ankara University, Turkey. Her interest lies in data analytics, and she enjoys exploring new technologies and automating some of her daily routines. Her desire to contribute and impact the world she lives in led her to apply to Outreachy. She found the moja global project a perfect match for her interests, and by the end of the contribution period, she was sure about her decision. Her mission is to make a positive change, work on a great variety of existing projects and be a part of the welcoming community at moja global.

Outreachy Intern #2: Ankita Sharma

Picture of Ankita. Ankita is the Outreachy 2021 Intern from moja global

Ankita is currently pursuing a Bachelor of Technology in Electronics and Communication from the International Institute of Information Technology, Naya Raipur, India. She found an early-career interest in Computer Science and always wanted to see how real-life development works in an organization which led her to open-source development. She learned about Outreachy while searching for open source opportunities and discovered the moja global organization and found the project very interesting. After contributing during the contribution period, she is excited to work for the common goal with the moja global community!

We want to thank the Software Freedom Conservancy for running the Outreachy internships. Welcome, Kateryna and Ankita to the Outreachy Internships! Thanks for your interest, and we hope it will be a great learning experience for all of you!

How to get involved?

Moja global hosts mentorship programs for students and professionals to learn more about open-source software and get involved in the open-source community. We are looking for students and professionals to join us in our mentorship programs. If you are interested in learning more about open-source software, or if you are interested in getting involved in the open-source community, please join our Slack community.

To know more about the mentorship programs we participate in, have a look at our mentorship page.

Meet moja global Google Season of Docs 2022 interns

May 17, 2022 · 4 min read

Harsh Mishra

Community Manager

Moja global is excited to announce the selection of two Google Season of Docs interns working to support our mission in developing open-source Monitoring, Reporting, and Verifying (MRV) software for forestry, agriculture, and other land use (AFOLU) sectors.

Google Season of Docs is an annual program by Google providing an opportunity for technical writers to contribute to open-source software. Moja global has been participating in Google Season of Docs (GSoD) for the past two years, and we are participating for the third time this year. This year's Google Season of Docs aims to build technical documentation, case studies, and testing existing documentation alongside preparing an all-new FLINT handbook and preparing credible docs on climate science.

GIF showing moja global and GSoD's logo side-by-side

About the project

Last year’s Google Season of Docs saw the development of a Documentation Working Group, led by our Google Season of Docs interns, which has led to a 400% increase in the community size due to the active engagement, thus providing an easy entry point for new contributions. Though we saw a rise in technical documentation, reference guides, and case studies, there is still much to do. We are currently focussing on more technical documentation regarding the capabilities of our central framework to provide clear examples of common FLINT use cases.

While the FLINT framework can describe any generic stock and flow model (including finances, water, or biodiversity), landscape and carbon process modeling is the most popular use case. In addition to technical documentation of our software, we will also be building the FLINT Analysis Handbook, which will serve as a gentle introduction for new users progressing through the FLINT training examples. These projects will explain the core ecosystem and analysis concepts taking place when running a FLINT model and how it exactly works!

Based on the application process, which revolved around submitting a proposal (covering their technical writing & open-source experience) and community interactions, we selected two mentees: Namya LG and Amarachi Iheanacho.

Namya LG

Picture of Namya. Namya is the GSoD 2022 intern from moja global

Namya LG is an open-source enthusiast and loves learning and experimenting. She is very passionate about documenting her developer journey and writes on Medium. Namya started contributing to moja global through the Winter Of Code program in 2021. She is inspired by moja global’s initiative to combat the climate crisis using technology and has contributed to the community website, FLINT Cloud, FLINT Reporting, and is currently exploring more in the moja global ecosystem.

Namya will be working on the Building technical documentation, case studies, and testing existing documentation project, where she will work on developing technical documentation for moja global users, building credible case studies to promote moja global’s FLINT further, and building a pipeline to test and validate existing documentation.

Amarachi Iheanacho

Picture of Amarachi. Amarachi is the GSoD 2022 intern from moja global

Amarachi Iheanacho is a software engineer and an adept technical writer. With over 2 years of experience in creating technical articles and end-user documentation, Amarachi has created technical content for organizations like Cloudinary, Appwrite, Devwares, and Hackernoon.

Amarachi will be working on the Building FLINT handbook and preparing credible docs on climate science project, where she will write more approachable FLINT development and calibration documentation. It will help our users and contributors better understand climate science, carbon models, and how FLINT fits amongst them.

How to get involved?

Moja global hosts mentorship programs all year round. You can find more details about the projects at moja global on our GitHub organization and the various mentorship programs we participate in on our initiatives page. These mentorship programs allow students and professionals to learn more about open-source software and get involved in the open-source community.

If you are interested in learning more about open-source software, or if you are interested in getting involved in the open-source community, please join our Slack community.

Some technical backgouround​

Project Description 📌​

Tech Stacks Used​

Mentors​

Fellow contributors​

What do I believe I achieved in a general aspect?​

DVC pipelines on the GCBM module​

GCBM.Belize repository​

A little about DVC​

What was the goal?​

What does the pipeline offer to the user?​

How does the DVC pipeline works?​

How it can be utilized in the future?​

Usage examples​

Processing the Land Sector Datasets​

The Land Sector Datasets repository​

What's the issue?​

What's the goal?​

What was my approach?​

Notes​

CML Action on FLINT.Cloud​

FLINT.Cloud Repository​

A little about CML​

What was the original goal?​

How does it work?​

What does the Action achieve?​

How and why we had to modify the Action​

How it can be enriched?​

Final Thoughts​

References​

Project Description✍​

My Contributions🎯​

Migration of existing project from Vue2 to Vue3 🔁​

Designing🌸​

Implementation & Documentation📝​

Building👩‍💻​

Testing🔍​

Launch🚀​

Integration of UI library into existing projects🎊​

Summary👩‍🎓​

Mentors​

Community Members​

Community Bonding Period​

Week 1:​

Week 2:​

Week 3:​

Week 4:​

Week 5:​

Week 6:​

Week 7:​

Week 8:​

Week 9:​

Week 10:​

Further Steps:​

Previous Blogs:​

Emission and Removal Processes​

Biomass​

Dead Organic Matter​

Soils​

Livestock​

Project​

Outreachy Intern #1: Kateryna Derhaus​

Outreachy Intern #2: Ankita Sharma​

How to get involved?​

About the project​

Namya LG​

Amarachi Iheanacho​

How to get involved?​​​

Some technical backgouround

Project Description 📌

Tech Stacks Used

Mentors

Fellow contributors

What do I believe I achieved in a general aspect?

DVC pipelines on the GCBM module

GCBM.Belize repository

A little about DVC

What was the goal?

What does the pipeline offer to the user?

How does the DVC pipeline works?

How it can be utilized in the future?

Usage examples

Processing the Land Sector Datasets

The Land Sector Datasets repository

What's the issue?

What's the goal?

What was my approach?

Notes

CML Action on FLINT.Cloud

FLINT.Cloud Repository

A little about CML

What was the original goal?

How does it work?

What does the Action achieve?

How and why we had to modify the Action

How it can be enriched?

Final Thoughts

References

Project Description✍

My Contributions🎯

Migration of existing project from Vue2 to Vue3 🔁

Designing🌸

Implementation & Documentation📝

Building👩‍💻

Testing🔍

Launch🚀

Integration of UI library into existing projects🎊

Summary👩‍🎓

Mentors

Community Members

Community Bonding Period

Week 1:

Week 2:

Week 3:

Week 4:

Week 5:

Week 6:

Week 7:

Week 8:

Week 9:

Week 10:

Further Steps:

Previous Blogs:

Emission and Removal Processes

Biomass

Dead Organic Matter

Soils

Livestock

Project

Outreachy Intern #1: Kateryna Derhaus

Outreachy Intern #2: Ankita Sharma

How to get involved?

About the project

Namya LG

Amarachi Iheanacho

How to get involved?