Touchless Browse

What does it do

Welcome to Touchless Browse, a revolutionary web browser that brings the science fiction fantasy of hands-free internet navigation to reality. This innovative project combines natural language processing, computer vision, and voice recognition and transcription models to provide a unique web browsing experience where physical interaction with a mouse and keyboard is a thing of the past.

Key Features:

Voice-Powered Google Searches: Utilize your voice and the Whisper AI model for seamless Google searches, complete with GPT-powered summaries of links.
Voice & Hand Gesture Navigation: Scroll and click through sites using just your voice. Plus, control a cursor with your index finger movements captured by your webcam, mimicking futuristic "holographic" control.

Touchless Browser is at the forefront of redefining how information is consumed and the internet is navigated, offering an unmatched level of innovation.

How we built it

Touchless Browse is built on a blend of voice recognition, natural language processing, and gesture recognition technologies. Here's how we pieced it together:

Voice Input: We employed the Whisper AI model for translating user voice input into text.
Natural Language Processing: ChatGPT 3.5 API was used for processing user input and organizing web search results from the SERPER API (Google Search results). The results are reformatted and provided to the user through text-to-speech Python libraries.
Audio Processing: We used PyAudio, FFMpeg, and PyDub for sophisticated audio processing.
Gesture Recognition: OpenCV was utilized for facial recognition, modified to track the user's finger movements for cursor control via PyAutoGui.

Challenges we ran into

The journey wasn't smooth, and we faced several challenges:

Eye Tracking for Scrolling: Initially, we attempted to implement scrolling using eye tracking. However, the detailed nature of eye movements and the need for minimal webcam calibration made this impractical.
Gesture Recognition: Implementing index finger movement recognition using image recognition libraries like OpenCV was a substantial challenge.
Speech Recognition Limitations: We encountered issues with the native Python text-to-speech speechrecognition library, leading us to opt for OpenAI's Whisper API for its efficiency.
Managing Complexity: Coordinating numerous functions across various branches was a significant task, making our documentation process crucial.

Despite these challenges, we emerged with a rewarding end product.

Accomplishments that we're proud of

Integration of Advanced AI Technologies: Successfully integrating computer vision, voice recognition, and natural language processing.
Gesture Recognition Module: Developing a module that allows basic navigation through simple hand movements.
Realizing a Sci-Fi Dream: Bringing to life a vision of a keyboardless and mouseless interface for accessing information.

What we learned

Integration of AI Technologies: The importance of integrating various AI technologies to build practical applications.
Product Differentiation: The necessity of distinguishing our product in the market, leading to the unique touchless feature.
Limits of AI Models: Understanding that AI models are tools that require human ingenuity to be fully utilized in applications.

What's next for Touchless Browse

Looking forward, we aim to:

Enhance Gesture Recognition: Expand the gesture recognition capabilities for more complex commands.
Improve Stability and Accuracy: Focus on the stability and accuracy of voice and gesture inputs.
Enhance Virtual Keyboard and Mouse Functions: Improve the virtual keyboard and mouse stability and scrolling functionalities.
AI-Driven Content Summarization: Integrate AI-driven content summarization for enhanced browsing.

Our ultimate goal is to create a fully immersive, voice and gesture-based browsing experience accessible to all users.

Security Note

For security reasons, API keys located in key.py have been omitted from the public version of our repository.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
audio_files		audio_files
.gitignore		.gitignore
Here are the websites I found. Which website to you wish to visit		Here are the websites I found. Which website to you wish to visit
README.md		README.md
Readme.md		Readme.md
ask_GPT.py		ask_GPT.py
ask_google_SERP.py		ask_google_SERP.py
box.py		box.py
categorized_gpt_text.py		categorized_gpt_text.py
convert_audio_to_text_WHISPER.py		convert_audio_to_text_WHISPER.py
convert_user_audio_to_text.py		convert_user_audio_to_text.py
dependencies.txt		dependencies.txt
format_google_results_GPT.py		format_google_results_GPT.py
get_text_from_website.py		get_text_from_website.py
gpt_functions.py		gpt_functions.py
main.py		main.py
main_interface.py		main_interface.py
mouse.py		mouse.py
open_selected.py		open_selected.py
open_url.py		open_url.py
pass_to_GPT.py		pass_to_GPT.py
record_audio_convert_to_text.py		record_audio_convert_to_text.py
run1.bat		run1.bat
run2.bat		run2.bat
run_gpt_python_code.py		run_gpt_python_code.py
select_website.py		select_website.py
text_box.py		text_box.py
text_to_speech.py		text_to_speech.py
voice_activation.py		voice_activation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Touchless Browse

What does it do

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Touchless Browse

Security Note

Built With

About

Releases

Packages

Languages

kia8804/TouchlessBrowseFilteredFinal

Folders and files

Latest commit

History

Repository files navigation

Touchless Browse

What does it do

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

What's next for Touchless Browse

Security Note

Built With

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages