# GLMA Optimized Media API

A high-performance FastAPI application providing advanced Text-to-Speech (TTS) and Speech-to-Text (STT) services. This project is specifically optimized for mixed Arabic and English content, featuring seamless transitions and high-quality vocalization.

## 🚀 Features

- **Mixed-Language TTS**: Seamlessly switch between Arabic and English using high-quality Multilingual voices.
- **Advanced Arabic Tashkeel**: Automatic vocalization of Arabic text using `Mishkal` and `PyArabic`.
- **Natural Transitions**: Uses `pydub` for silence trimming and crossfading between language segments to ensure human-like speech flow.
- **Sentiment-Aware Speech**: Automatically adjusts speech rate, pitch, and volume based on the sentiment of the text.
- **Fast Transcription**: Powered by `Faster-Whisper` for highly accurate and rapid speech-to-text conversion.
- **Parallel Processing**: TTS generation is parallelized to ensure fast responses even for long documents.
- **Modular Architecture**: Clean code structure with a dedicated service layer for easy maintenance.

## 🛠 Prerequisites

- **Python 3.13+** (Optimized for the latest Python features).
- **FFmpeg**: Required for audio processing and merging.
  - **Windows**: [Download FFmpeg](https://ffmpeg.org/download.html) and add the `bin` folder to your System PATH.
  - **Linux**: `sudo apt install ffmpeg`

## 📦 Installation

1. **Clone the repository**:

   ```bash
   git clone <repository-url>
   cd whisper_new
   ```

2. **Create and activate a virtual environment**:

   ```bash
   python -m venv venv
   # Windows:
   venv\Scripts\activate
   # Linux:
   source venv/bin/activate
   ```

3. **Install dependencies**:
   - For **Windows**:
     ```bash
     pip install -r requirements.windows.txt
     ```
   - For **Linux**:
     ```bash
     pip install -r requirements.linux.txt
     ```

4. **Environment Configuration**:
   Create a `.env` file in the root directory (or use the existing one):
   ```env
   APP_HOST=0.0.0.0
   APP_PORT=8080
   WHISPER_MODEL_SIZE=small
   ```

## 🖥 Running the Server

Start the API server using Uvicorn:

```bash
python server.py
```

The API will be available at `http://localhost:8080`. You can access the interactive documentation at `http://localhost:8080/docs`.

## 📡 API Endpoints

- `POST /speak/`: Converts text to speech. Supports `Auto` mode for mixed languages.
- `POST /transcribe/`: Transcribes uploaded audio files to text.
- `GET /available_voices/`: Lists all available voices from the Edge TTS service.

## 📂 Project Structure

- `server.py`: API entry point and route definitions.
- `services/`: Business logic for TTS and Transcription.
- `utils.py`: Helper functions for sentiment analysis and text splitting.
- `tashkeel.py`: Arabic text normalization and vocalization logic.

## 📄 License

This project is licensed under the MIT License.
