diff --git a/data/success_stories/ahrefs.md b/data/success_stories/ahrefs.md index 667f3b4e8a..bedfc5d11f 100644 --- a/data/success_stories/ahrefs.md +++ b/data/success_stories/ahrefs.md @@ -1,30 +1,72 @@ --- -title: Peta-Byte Scale Web Crawler +title: Marketing Intelligence Tools Powered by Big Data logo: success-stories/ahrefs.svg card_logo: success-stories/white/ahrefs.svg background: /success-stories/ahrefs-bg.jpg theme: blue -synopsis: "Ahrefs crawls the entire internet constantly to collect, process, and store data to build an all-in-one SEO toolkit." +synopsis: "Ahrefs leverages OCaml to achieve massive scalability, processing billions of daily requests, running the third biggest web crawler in the world, while maintaining a lean, efficient team." url: https://ahrefs.com/ priority: 2 +why_ocaml_reasons: +- Type Safety +- Expressiveness and Reliability +- Unified Technology Stack +- Integration with JavaScript Ecosystem +- Customizability +- Scalability +- Performance --- -Ahrefs develops custom distributed petabyte-scale storage and runs an internet-wide crawler to index the entire Web. The company also builds various analytical services for end users. Ahrefs’s data processing system uses OCaml as its primary language, which currently processes up to 6 billion pages a day, and they also use OCaml for their website’s backend. Ahrefs has a multinational team with roots in the Ukraine, an office in Singapore, and remote collaborators all around the world. - ## Challenge -Ahrefs runs with a relatively small team compared to the size of the task at hand. Indexing the web is very expensive and requires considerable resources, both humans and machines. Turning petabytes of data into something intelligible on the fly is also a big challenge. It’s necessary to build processes running fast, 24/7, with as little maintenance as possible and scarce human resources. +Ahrefs has always been committed to operational efficiency and self-reliance, even as it grew to become a leader in the SEO industry. Unlike many VC-funded companies, Ahrefs has chosen to prioritize maintaining a small, focused team while achieving exceptional output. However, this approach came with its own set of challenges. + + +One major challenge was maintaining high-quality standards across the codebase. Ahrefs needed a cohesive and type-safe technology stack that worked seamlessly across the frontend and backend, enabling developers to write reliable code with confidence and ensuring a consistent, high-quality output. + +Another challenge was scaling operations. The lack of a integration between the different bricks often required teams to wait on one another, needing collaboration to create APIs, leading to delays and inefficiencies in delivering features. The team was looking for a unified stack, empowering all engineers to contribute to any part of the stack, reducing bottlenecks, increasing ownership and business knowledge, while maintaining a high trust in the final result. + +## Result + +Adopting OCaml was a transformative decision for Ahrefs, enabling the company to overcome its challenges and significantly improve operations. By transitioning the entire frontend team from PHP and jQuery to OCaml, supported by Melange and React, Ahrefs achieved a full-stack evolution. + +One of the most impactful outcomes was the ability to share data types between the frontend and backend. This streamlined development processes, reduced friction, and allowed engineers to contribute meaningfully across the entire stack. With this new technology foundation, Ahrefs not only maintained but significantly scaled its operations. + +Today, Ahrefs’ systems handle **millions of frontend requests per day** and process **billions of backend requests per day**. The codebase powering this scale has grown to **1.5 million lines of OCaml code**, underscoring the robustness and scalability of their technology stack. These advancements have helped Ahrefs become a trusted industry leader, relied upon by **44% of Fortune 500 companies**. +For more up-to-date statistics, see [Ahrefs' statistics and metrics page](https://ahrefs.com/big-data). + +## Why OCaml + +Ahrefs chose OCaml as the backbone of its technology stack because of its unique ability to combine expressiveness and reliability. The language allowed developers to write concise, efficient code that remained stable over the long term, with systems often running for years without surprises. + +One of the key reasons for selecting OCaml was its versatility in enabling a unified technology stack. The ability to use shared types between frontend and backend eliminated redundant work, improved collaboration, and ensured consistency across the board. Additionally, OCaml’s robust type system provided engineers with a high degree of confidence in their code, significantly reducing errors and simplifying maintenance. + +OCaml’s ecosystem, especially with the introduction of Melange (formerly BuckleScript), allowed Ahrefs to bridge the gap between OCaml’s strong typing and the flexibility of the JavaScript ecosystem. Initially adopted for crawling and data processing, OCaml’s utility expanded to the frontend when Melange made it possible to compile OCaml to JavaScript. This decision empowered Ahrefs to create tools that made their vast datasets more accessible and actionable. ## Solution -Ahrefs went with OCaml for data processing at the scale of the Web. The company was in its infancy with a limited number of employees and little financial resources. The language provided a combination of qualities hard to find elsewhere: -- Native compilation -- High-level types for clear expression and compact code -- Solid and stable compiler -- Empathy for industrial users +OCaml addresses Ahrefs’ challenges, providing the tools needed to build a cohesive, scalable, and efficient technology stack. One of the most impactful features was its end-to-end type safety. Using tools like ATD, Ahrefs generated shared types for both frontend and backend development, ensuring consistency and reducing the likelihood of bugs. + +OCaml’s expressiveness allows Ahrefs to avoid relying on bloated, one-size-fits-all frameworks. Instead, the team could craft solutions tailored to their specific needs, resulting in more efficient and maintainable systems. This customization extended to their tech stack, which seamlessly integrated OCaml with complementary technologies like Melange, React, Clickhouse, MySQL, and Elasticsearch. + +The unified OCaml stack also fosters a more collaborative environment. By allowing all engineers to contribute across domains, Ahrefs broke down silos and improved overall productivity. The result was a streamlined development process that enabled the company to maintain high standards of output while keeping its team small and focused. + +## Lessons Learned + +Ahrefs’ journey with OCaml provides valuable insights for other companies considering a similar path: + +1. **Embrace Simplicity:** OCaml’s type system is intuitive yet powerful, enabling developers to build expressive and reliable applications without unnecessary complexity. +2. **Invest in Learning:** Training the team in OCaml is essential but rewarding. Tools like the Melange playground can make the onboarding process smoother and more approachable. +3. **Build Tailored Solutions:** By leveraging OCaml’s expressiveness, Ahrefs avoided generic frameworks in favor of custom-built solutions that addressed their unique needs. While this approach requires more ownership, it results in greater efficiency and precision. +4. **You Will need to Write Bindings:** Writing bindings can be complex, but resources from the community simplify this process. +5. **Adopt Gradually:** Starting with small contributions to libraries or limited integrations can help teams build confidence before transitioning fully to OCaml. -As the company grew and expanded its service offerings, they took the opportunity to write its website in OCaml (native OCaml for the backend, ReasonML for the frontend). This bold choice gave them a unique advantage. Thanks to the types shared across the entire stack, they can safely reason about data, from creation to final consumption. +## Open Source -## Results +Ahrefs is proud to support the OCaml ecosystem by contributing tools and libraries that benefit the broader community. These contributions include: -Ahrefs turns billions of websites into data, first stored into over 100PB of storage and then into valuable information for tens of thousands of customers worldwide. As the internet is becoming an increasingly competitive place, Ahrefs provides a vital service for companies running a business on the web. Ahrefs managed to face this challenge while keeping the company lean and efficient. +- **[Melange Recharts](https://github.com/ahrefs/melange-recharts):** Recharts bindings for Melange. +- **[Ahrefs DevKit](https://github.com/ahrefs/devkit):** Tools and utilities for writing applications. +- **[Melange Bindings](https://github.com/melange-community/bindings):** A community-driven resource for bindings. +- **[Melange JSON PPX](https://github.com/ahrefs/melange-json-ppx):** A replacement for the ATD runtime in BuckleScript. +- **[OCaml Community Tools](https://github.com/ocaml-community):** Contributions to widely used tools like `ocurl` and `ocaml-mariadb`. diff --git a/src/ocamlorg_web/lib/redirection.ml b/src/ocamlorg_web/lib/redirection.ml index e24294c4de..82188fbf36 100644 --- a/src/ocamlorg_web/lib/redirection.ml +++ b/src/ocamlorg_web/lib/redirection.ml @@ -327,6 +327,8 @@ let from_v2 = ("/docs/platform-principles", Url.tool_page "platform-principles"); ("/docs/platform-users", Url.tool_page "platform-users"); ("/docs/platform-roadmap", Url.tool_page "platform-roadmap"); + ( "/success-stories/peta-byte-scale-web-crawler", + Url.success_story "marketing-intelligence-tools-powered-by-big-data" ); ] let make ?(permanent = false) t =