Skip to content

Commit

Permalink
Merge pull request #34 from Plikt/main
Browse files Browse the repository at this point in the history
Feature update: Returns Open Access URL when only a DOI is submitted
  • Loading branch information
Plikt authored May 23, 2024
2 parents 43f4cf1 + a792f39 commit 0f48a8f
Show file tree
Hide file tree
Showing 9 changed files with 294 additions and 1,106 deletions.
Binary file modified .DS_Store
Binary file not shown.
80 changes: 53 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,14 @@
# automating-metadata
# Automating Author Identification

A central place to drop all of our experiments! As a disclaimer - we're still at the experimentation phase! Lots of this code will change.
This is the first operational version of the automated metadata application. We've updated it to meet the requirements of the [DeSci Nodes](nodes.desci.com) application as a first use case.

The core of this project is to see how we can use LLM and other technology available to us to respond to the following problems (not an exhaustive list):
The core of this version of the project is to implement one quality-of-life improvement for submitting articles to pre-print servers. It is intended to address the tedium of manually inputting all the OrcIDs for each co-author on a paper and promote consistent application of the OrcID standard.

1. Metadata is non-standard/incomparable.
2. Metadata is inconsistently applied.
3. Metadata is inflexible

These features create a problem which makes relevant literature difficult to connect to each other, leaves the decision of what metadata to include up to researchers or journals, and requires standard application to be useful. We then end up with inconsistent metadata that can't connect research projects either to their own research objects like the code and data associated with them, or to other papers that might be relevant. These issues make academic search and representing an accurate 'map' of science extremely difficult.

We hope to see how we can create a flexible metadata standard that accurately connects papers both to their own additional materials and to other research based on as much data as we can gather.

We're operationalizing this problem as:

How might we reliably automate a literature review?
For documentation on the full scope of the Automating Metadata project see the documentation at the [Automating Metadata Repository](/~https://github.com/DeSci-md/automating-metadata).

## Publication Text Extraction
### Aim/Goal
The aim/goal of this is to provide a way to programmically extract machine readable text from journal publication when provided an identifier such as a title or DOI.
Adapted using the methodology described in https://www.nature.com/articles/s41524-021-00687-2, "Automated pipeline for superalloy data by text mining"
The aim/goal of this is to provide a way to return the author names, affiliations, and OrcID from publications when provided an identifier such as a DOI or any PDF (published or unpublished). See [endpoint](#endpoint) for an example of the input and outputs for the project.

### Setup and use

Expand All @@ -28,34 +17,71 @@ Make sure you have Docker installed on your machine. You can download it from [D
### Clone the Repository

```bash
git clone /~https://github.com/DeSci-md/automating-metadata.git
git clone /~https://github.com/DeSci-md/automating-metadata-v1.git
cd your-repository
```

### Build the Docker Image

```bash
docker build -t your-image-name .
docker build -t automating-metadata-v1.
```

### Run the Docker Container

```bash
docker run -e NODE_ENV=your-node-env -e DOI_ENV=your-doi-env your-image-name
docker run -p 5001:5001 automating-metadata-v1
```

Replace `your-node-env` and `your-doi-env` with the desired values for `NODE_ENV` and `DOI_ENV`.
The application will be available at `http://localhost:5001/invoke-script`.

### Setting Environment Variables
#### Endpoint

- **NODE_ENV**: This determines the node you want to get metadata for. EG: 46
- **DOI_ENV**: This is a DOI that corresponds to the node. if you do not have a DOI for the article, do not set this variable.
The application exposes a single endpoint:

You can set these environment variables using the `-e` option with the `docker run` command.
**POST /invoke-script**
- **Request Body**: JSON object with `pdf` and `doi` fields
- **Response**: JSON object with `output` field containing the result of the `langchain_orcid2.run()` function

### Example
Example request:
```json
{
"pdf": "path/to/pdf/file",
"doi": "10.1234/example-doi"
}
```

```bash
docker run -e NODE_ENV="46" -e DOI_ENV="10.3847/0004-637X/828/1/46" your-image-name
Example response:
```json
{
{
"output":{
"authors":{
"M. Anonymous":{
"@id":"https://orcid.org/0000-0000-0000-0000",
"affiliation":"Well-known Foundation",
"name":"Mark Anonymous",
"role":"Person"
}
},
"title":"The title of the paper is \"The Extracted Title"
}
}
}
```

### Contribution
If you would like to contribute to this project, please follow these guidelines:
- Fork the repository
- Create a new branch for your feature or bug fix
- Make your changes and commit them
- Push your branch to your forked repository
- Submit a pull request to the main repository

### Credits

This project uses the many third-party libraries documented in the requirements file!

### License
This project is licensed under the [GNU General Public License v3.0](https://www.gnu.org/licenses/gpl-3.0.en.html).

Binary file modified app/.DS_Store
Binary file not shown.
Loading

0 comments on commit 0f48a8f

Please sign in to comment.