AWS Configuration to Automatically Notify Webhooks of New Papers on arXiv
Table of Contents
Preparation
Check Environment
Check your environment. Ubuntu 22.04.4 LTS is only supported.
lsb_release -a
# Ubuntu 22.04.4 LTS
uname -m
# x86_64
Create Zip for AWS Lambda Layer
Clone my github project arxiv-bot.
git clone git@github.com:kktsuji/arxiv-bot.git
cd arxiv-bot
Python version must be 3.12.3.
python --version
# Python 3.12.3
Create new directory and install python packages to the directory. Then create zip file. The zip file name must be python.zip
. This file will used for aws lambda layer.
mkdir python
pip install -U pip
pip install -r requirements.txt -t ./python
zip -r python.zip ./python
Webhook Settings
Get webhook url of the service you want to notify.
(Optional) OpenAI Settings
Get OpenAI API Key if you want to summary papers.
Note: The OpenAI API is chargeable.
AWS Lambda Settings
Lambda Layer
Visit AWS Lambda Console and “Create layer”.
Upload python.zip
file and fill “compatible architectures” of your environment that created the zip file. And set Python 3.12 to Runtime.
Note: If the file name is not python.zip
, lambda function will fail to import third-party python modules.
Lambda Function
Visit “Create function” in AWS Lambda Console.
Select “Author from scratch” and fill forms correctly.
After creating function, visit “Add a layer” page.
Select “Custom layers” and the layer you created and its version.
Copy entire code of main.py
in arxiv-bot project, and past it to “Code” > “Code source” > lambda_function.py
.
And push “Deploy” button.
Push “Test” and fill the “Configure test event”.
The “Event JSON” must follow this format (these parameters are used for only test).
{
"webhook_url": "https://YOUR_WEBHOOK_URL",
"keywords": "keyword1,keyword2,keyword3",
"categories": "cs.AI,cs.CV,cs.LG,eess.IV",
"openai_api_key": "YOUR_API_KEY"
}
Key | Description |
---|---|
webhook_url | The webhook url such as Slack, Teams, and other service APIs. |
keywords | Keywords used in queries for arXiv searches. Each keyword is separated by a comma with no spaces. Keywords are used to search titles and abstracts and are searched for with “or”. For example, if the value “keyword1,key word2” is specified, paper containing keyword1 and papers containing ‘key word2’ will be displayed as search results (if a keyword contains spaces, single quotation marks are be used). |
categories | Categories used in queries for arXiv searches. This follows the same rule of keywords (separated by comma without space, searched with “or”). And spaces are removed. For more details, see arXiv Category Taxonomy. |
OPENAI_API_KEY | (Optional) OpenAI API Key. If you do not use the paper summarization function, please leave blank like bellow: "openai_api_key": "" |
Save configuration and execute test.
Execution results can be checked in “Code source” or at the service of the webhook URL you wrote.
Once the operation is checked, note the “Function ARN” of the Lambda function.
AWS IAM Settings
Create policy to invoke Lambda function.
IAM Policy
Visit AWS IAM > Policies > “Create policy”.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:YOUR-LAMBDA-FUNCTION-ARN"
}
]
}
IAM Role
Visit AWS IAM > Roles > “Create role”.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "admitEventBridge",
"Effect": "Allow",
"Principal": {
"Service": "scheduler.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
Then attach the policy to the role.
AWS EventBridge Settings
Setup to execute Lambda functions at a fixed time each day.
EventBridge Schedule
Visit AWS EventBridge Console and “Create schedule”.
Fill the forms. Please confirm the timezone setting and the cron setting to execute the Lambda function at the correct time.
Correctly set json parameters to obtain arXiv search results (these parameters are used for daily queries).
{
"webhook_url": "https://YOUR_WEBHOOK_URL",
"keywords": "keyword1,keyword2,keyword3",
"categories": "cs.AI,cs.CV,cs.LG,eess.IV",
"openai_api_key": "YOUR_API_KEY"
}
Fill remaining forms.
Set IAM Policy you created to Permission > Execution role > Use existing role > Role name.
Create schedule.
Settings completed!