====== Reasons (Babak) ======

I fully undestand the motives of your suggestions but because you lack the context, you are fully missing the point. On this page I will explain moreso you understand why we need the projects to be done [[babak|exactly as I described]].

==== PHP Pipeline ====
We area startup that developed a PHP based system for processing AI API calls. While PHP is an outdated language, there are many more people who know it well unlike Python and mainly we know it well.
There is no sense to argue, its simply a fact.

All scripts that you would produce will run in command line but one day that may get converted into background PHP functions and become part of the services of our backend, i.e. can be run inside of any webapp in a controlled scalable way. Until that time they will remain just cmd line scripts.

Therefore we also need the PHP scripts to just call the API via **HTTP requests**, not any library like Langchain port or so.

==== Search engine ====
The DB that we scrape from the webpage is a set of medical articles and qiestions that we want to search semantically.

The user will give us a medical report, we embed it and find the most relevant articles and questions using Pinecone.

We are not implementing any generic search engine, we are not going to scrape many such site. We are even not going to update the set for now. So we need a one time job of scraping and indexing just those 2 sets of data.

=== Searching ===
We get a medical report. From it we distill the definition of the health problem, embed it and then find PARAGRAPHS of text talking about something similar. We then recommend the article containing the paragraph to the user to read.

But we dont know what approach will work best. Will it be better to embed the whole article and search for that? Or embed parts of it and search for them?
We dont know and we dont think that anyone knows. We need to test and find out. To be able to test it, we need to have both vectors - for the whole article as well as the parts.