Skip to content

Commit

Permalink
Merge branch 'main' into nsc/mvp-search
Browse files Browse the repository at this point in the history
  • Loading branch information
nickscamara committed Apr 23, 2024
2 parents 41263bb + 841279c commit 9ded75a
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 3 deletions.
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,17 @@

Crawl and convert any website into LLM-ready markdown. Build by [Mendable.ai](https://mendable.ai?ref=gfirecrawl)


*This repository is currently in its early stages of development. We are in the process of merging custom modules into this mono repository. The primary objective is to enhance the accuracy of LLM responses by utilizing clean data. It is not ready for full self-host yet - we're working on it*

## What is Firecrawl?

[Firecrawl](https://firecrawl.dev?ref=github) is an API service that takes a URL, crawls it, and converts it into clean markdown. We crawl all accessible subpages and give you clean markdown for each. No sitemap required.

_Pst. hey, you, join our stargazers :)_

<img src="https://github.com/mendableai/firecrawl/assets/44934913/53c4483a-0f0e-40c6-bd84-153a07f94d29" width="200">


## How to use it?

We provide an easy to use API with our hosted version. You can find the playground and documentation [here](https://firecrawl.dev/playground). You can also self host the backend if you'd like.
Expand Down
3 changes: 2 additions & 1 deletion apps/api/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ BULL_AUTH_KEY= #
LOGTAIL_KEY= # Use if you're configuring basic logging with logtail
PLAYWRIGHT_MICROSERVICE_URL= # set if you'd like to run a playwright fallback
LLAMAPARSE_API_KEY= #Set if you have a llamaparse key you'd like to use to parse pdfs
SERPER_API_KEY= #Set if you have a serper key you'd like to use as a search api
SERPER_API_KEY= #Set if you have a serper key you'd like to use as a search api
SLACK_WEBHOOK_URL= # set if you'd like to send slack server health status messages
11 changes: 10 additions & 1 deletion apps/api/requests.http
Original file line number Diff line number Diff line change
Expand Up @@ -49,4 +49,13 @@ content-type: application/json

### Check Job Status
GET https://api.firecrawl.dev/v0/crawl/status/cfcb71ac-23a3-4da5-bd85-d4e58b871d66
Authorization: Bearer
Authorization: Bearer

### Get Active Jobs Count
GET http://localhost:3002/serverHealthCheck
content-type: application/json

### Notify Server Health Check
GET http://localhost:3002/serverHealthCheck/notify
content-type: application/json

71 changes: 71 additions & 0 deletions apps/api/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,77 @@ app.get(`/admin/${process.env.BULL_AUTH_KEY}/queues`, async (req, res) => {
}
});

app.get(`/serverHealthCheck`, async (req, res) => {
try {
const webScraperQueue = getWebScraperQueue();
const [waitingJobs] = await Promise.all([
webScraperQueue.getWaitingCount(),
]);

const noWaitingJobs = waitingJobs === 0;
// 200 if no active jobs, 503 if there are active jobs
return res.status(noWaitingJobs ? 200 : 500).json({
waitingJobs,
});
} catch (error) {
console.error(error);
return res.status(500).json({ error: error.message });
}
});

app.get('/serverHealthCheck/notify', async (req, res) => {
if (process.env.SLACK_WEBHOOK_URL) {
const treshold = 1; // The treshold value for the active jobs
const timeout = 60000; // 1 minute // The timeout value for the check in milliseconds

const getWaitingJobsCount = async () => {
const webScraperQueue = getWebScraperQueue();
const [waitingJobsCount] = await Promise.all([
webScraperQueue.getWaitingCount(),
]);

return waitingJobsCount;
};

res.status(200).json({ message: "Check initiated" });

const checkWaitingJobs = async () => {
try {
let waitingJobsCount = await getWaitingJobsCount();
if (waitingJobsCount >= treshold) {
setTimeout(async () => {
// Re-check the waiting jobs count after the timeout
waitingJobsCount = await getWaitingJobsCount();
if (waitingJobsCount >= treshold) {
const slackWebhookUrl = process.env.SLACK_WEBHOOK_URL;
const message = {
text: `⚠️ Warning: The number of active jobs (${waitingJobsCount}) has exceeded the threshold (${treshold}) for more than ${timeout/60000} minute(s).`,
};

const response = await fetch(slackWebhookUrl, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(message),
})

if (!response.ok) {
console.error('Failed to send Slack notification')
}
}
}, timeout);
}
} catch (error) {
console.error(error);
}
};

checkWaitingJobs();
}
});


app.get("/is-production", (req, res) => {
res.send({ isProduction: global.isProduction });
});

0 comments on commit 9ded75a

Please sign in to comment.