-
Notifications
You must be signed in to change notification settings - Fork 12
How to identify technologies ?
In order to provide good diagrams, we need to be able to automatically identify technologies in model elements for which a package manager (Maven, NPM, pip, whatever) is used. How to do that ?
We want to know, by asking developers directly (or through well-known websites), how they choose their technologies
Polls like
Provide good overview of the state of technology landscape, but nothing about the decision processes
This first experiment should provide us research directions : what are good ways for developers to choose their technolongies ? These research directions will in turn allow us to perform experiments.
We have no hypothesis here
If developers give us some queriable information sources.
To get ideas and opinions, we asked questions of the company's developers, as well as on various forums (Reddit, Discord, Stackoverflow (failed), développez.com (still awaiting validation)). Below is a summary of what was said.
In general, the devs we interviewed didn't have a clear opinion, but had some interesting ideas. It was recommended that we visit a number of sites and forums to see what technologies were listed.
- Stackoverflow tech tags
- Stacks from stackshare.io
- Techempower
- Github/google Advanced Search Reddit
- ChatGPT
- Gitlab Auto Devops and RedHat OpenShift auto detection of technologies.
As for the devs interviewed directly, the most common response was that when it comes to an architecture doc, the name of the language(s) and frameworks is enough. Which is relevant, of course, but not precise enough.
#Forum message template Hello everyone !
I'm working on a maven project called aadarchi. It's a Maven archetype allowing you to easily create your agile architecture documentation using a mix of C4, Asciidoc and PlantUML.
So far it's been mostly focused on Java projects, and we're currently trying to make it work for JS, or even maybe Python. But there's a catch ! In order to provide good diagrams, we need to be able to automatically identify technologies in model elements for which a package manager (Maven, NPM, pip...) is used. But how ? How can we decide what is really a "technology", which deserves to be detected in a project (for example in package.json files) and used in its architecture documentation ? Of course, we already thought about "just the language and framework", but we need some advices...
So : What are your thoughts ?
Thank you !
--> QUESTION "OFF TOPIC", immediately closed.
We got a list of websites to analyze, but no really conclusive answer.
We suppose the subject relies heavily upon developer culture, and should require more analysis, both about information source but also about culture dynamics (which is way off-topic for a pure technology research subject).
Suggested by a company dev, why don't we use a scraping script (Python Scrapy) to get the list of technologies on relevant sites ? For him, it was worth a shot to try it on https://stackshare.io/ or maybe https://techdetector.de/welcome
- There is a StackShare dataset avaiable at coresignal
- A scrapy-based stackshare scraper is also available
It could provides some complementary data
There is no hy^pothesis, only search for data
Considering we don't know what we're looking for, this experiment has been postponed
While asking to Devs on Slack, I got this answer : "You might find some answers by looking at the way Gitlab Auto DevOps or Red Hat OpenShift "auto detects" the underlying language and technologies of a project." ~S.R
https://about.gitlab.com/stages-devops-lifecycle/auto-devops/
By reading available code to understand how they analyze the applciations to deploy
Consdiering we have no knowledge of the gitlab platform, this experiment has been postponed.
In the end, we decided that scraping the sites seems like a good solution, in order to recover as many technologies as possible on the most objective criteria possible. But we decided to do it in a more complete way, and on two different sites: mvnrepository and stackoverflow. We used 2 tools, scrapy (python) for mvnrepository and a RestAPI for stackoverflow, filtering technologies according to their popularity.
That there exists a correlation between number of downloads and questions asked on Stackoverflow
- Stackoverflow provides a BigQuery dataset containing all their data up to 2023
It should allow us to validate an hypothesis : there is a correlation between the StackOverflow activity and the download count
. Download questions count per mont . Download download count per month . Validate correlation
If we observe signifiant correlation between both sides
- mvnrepository doesn't provide download count infos fjor Java artifacts
- Stackoverflow dataset is only available in BigQuery
- mvnrepository and npmjs infos are differently formatted
- Considering these figures are quite the best-formatted ones we can get, we started another initiative at Zenika to analyze all that with more depth.
- We also built an automated download count extractor for the most-known artifacts of all languages at aadarchi-technology-detector. This project provides automated up-to-the-month download figres for npmjs and python dependencies (we're still searching for a way to get download figures for Java artifacts)
Use this template to describe a new experiment