# Fabric Copy Job POC - MySQL to Lakehouse

## Overview

This POC demonstrates **fully programmatic creation** of Microsoft Fabric Copy Jobs that copy data from a MySQL database (AWS RDS) to a Fabric Lakehouse, using the Fabric REST API. No portal interaction is required after the initial setup.

**Created:** 2026-02-11
**Branch:** `mysql-connection-pipeline_v2`

---

## What Was Built

An end-to-end pipeline that:
1. Authenticates with Microsoft Fabric via OAuth2 client credentials
2. Creates a MySQL connection in Fabric via the Connections API
3. Builds a Copy Job definition with source/destination/activities
4. Creates the Copy Job item in a Fabric workspace
5. Triggers the Copy Job run on demand

---

## Files Modified / Created

### New Files

| File | Purpose |
|------|---------|
| `app/Services/FabricCopyJobService.php` | Main service class. Contains `createMySqlToLakehouseCopyJob()` (end-to-end), `createMySqlConnection()`, definition builders, and inspection utilities. |
| `app/Console/Commands/CreateFabricCopyJobCommand.php` | Artisan command `fabric:copy-job` with 5 actions: `create`, `inspect`, `connections`, `run`, `extract-connection`. |
| `FABRIC_COPY_JOB_POC.md` | This documentation file. |

### Modified Files

| File | Changes |
|------|---------|
| `app/Helpers/ApiHelper.php` | Added 7 Copy Job API methods: `createFabricCopyJob()`, `getFabricCopyJobs()`, `getFabricCopyJob()`, `getFabricCopyJobDefinition()`, `updateFabricCopyJobDefinition()`, `runFabricCopyJob()`, `deleteFabricCopyJob()`. |

---

## Artisan Commands

All commands run via Docker:

```bash
docker exec prolytics_web php artisan fabric:copy-job <action> [options]
```

### Create a Copy Job (Full End-to-End)

```bash
# Create with 3 tables, Batch mode (full copy)
docker exec prolytics_web php artisan fabric:copy-job create --mode=Batch --tables="customers,orders,products"

# Create with CDC (incremental) mode
docker exec prolytics_web php artisan fabric:copy-job create --mode=CDC --tables="order_items"

# Create and immediately run
docker exec prolytics_web php artisan fabric:copy-job create --tables="orders" --run

# Reuse an existing MySQL connection instead of creating a new one
docker exec prolytics_web php artisan fabric:copy-job create --tables="orders" --connection-id=8f9f67dd-86c2-47a0-92e5-3ec238eef888

# Create a shell job (no tables/definition) — configure in portal later
docker exec prolytics_web php artisan fabric:copy-job create
```

### Inspect Existing Copy Jobs

```bash
# List all copy jobs in the workspace
docker exec prolytics_web php artisan fabric:copy-job inspect

# Include full decoded definitions (useful for debugging)
docker exec prolytics_web php artisan fabric:copy-job inspect --with-definitions
```

### Run a Copy Job On Demand

```bash
docker exec prolytics_web php artisan fabric:copy-job run --job-id=4eaadf6b-ccc7-4961-a431-b0e574e18432
```

### List Fabric Connections

```bash
docker exec prolytics_web php artisan fabric:copy-job connections
```

### Extract Connection ID from Portal-Created Job

```bash
# Useful when you create a Copy Job in the Fabric portal and need the connection ID
docker exec prolytics_web php artisan fabric:copy-job extract-connection --job-id=eb7570dd-5d35-4997-948b-656d05783d0d
```

---

## Key Discoveries (Not in Microsoft Documentation)

These were discovered through trial, error, and reverse-engineering existing Copy Job definitions:

| Discovery | Detail |
|-----------|--------|
| **MySQL Connections API type** | Must be exactly `"MySql"` (case-sensitive). `"MySQL"`, `"mysql"`, `"Mysql"`, `"MySql.Database"` are all rejected. |
| **Copy Job source type for MySQL** | `"MySqlTable"` |
| **Connection settings type** | `"MySql"` with `externalReferences` (not `typeProperties`) |
| **Table name format** | MySQL tables must be backtick-wrapped: `` `table_name` `` |
| **Connection creation method** | `creationMethod` must also be `"MySql"` |
| **Run API request body** | Must send `{}` (empty JSON object). Sending `[]` (empty array) returns a 400 error. |
| **Lakehouse destination** | Uses `typeProperties` with `workspaceId`, `artifactId`, `rootFolder: "Tables"` |
| **CDC mode** | Uses `jobMode: "CDC"` with `changeDataSettings.readMethod: "SnapshotPlusIncremental"` |
| **Portal-created connections** | MySQL connections created via the Fabric portal do NOT appear in the Connections API list. API-created connections DO appear. |

---

## Copy Job Definition Schema (Proven)

This is the exact `copyjob-content.json` schema that Fabric accepts for MySQL-to-Lakehouse:

```json
{
  "properties": {
    "jobMode": "Batch",
    "source": {
      "type": "MySqlTable",
      "connectionSettings": {
        "type": "MySql",
        "externalReferences": {
          "connection": "<mysql-connection-guid>"
        }
      }
    },
    "destination": {
      "type": "LakehouseTable",
      "connectionSettings": {
        "type": "Lakehouse",
        "typeProperties": {
          "workspaceId": "<workspace-guid>",
          "artifactId": "<lakehouse-artifact-guid>",
          "rootFolder": "Tables"
        }
      }
    },
    "policy": {
      "timeout": "0.12:00:00"
    }
  },
  "activities": [
    {
      "properties": {
        "source": {
          "datasetSettings": {
            "table": "`table_name`"
          }
        },
        "destination": {
          "writeBehavior": "Append",
          "datasetSettings": {
            "table": "destination_table_name"
          }
        },
        "enableStaging": false,
        "translator": {
          "type": "TabularTranslator"
        },
        "typeConversionSettings": {
          "typeConversion": {
            "allowDataTruncation": true,
            "treatBooleanAsNumber": false
          }
        }
      }
    }
  ]
}
```

---

## MySQL Connection Payload (Proven)

```json
{
  "displayName": "MySQL - fabricdb",
  "connectivityType": "ShareableCloud",
  "connectionDetails": {
    "type": "MySql",
    "creationMethod": "MySql",
    "parameters": [
      { "dataType": "Text", "name": "server", "value": "host:port" },
      { "dataType": "Text", "name": "database", "value": "database_name" }
    ]
  },
  "credentialDetails": {
    "singleSignOnType": "None",
    "connectionEncryption": "NotEncrypted",
    "skipTestConnection": false,
    "credentials": {
      "credentialType": "Basic",
      "username": "user",
      "password": "pass"
    }
  },
  "privacyLevel": "Organizational"
}
```

---

## API Endpoints Used

| Operation | Method | Endpoint |
|-----------|--------|----------|
| Create Connection | POST | `https://api.fabric.microsoft.com/v1/connections` |
| Create Copy Job | POST | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/copyJobs` |
| List Copy Jobs | GET | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/copyJobs` |
| Get Copy Job | GET | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/copyJobs/{id}` |
| Get Definition | POST | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/copyJobs/{id}/getDefinition` |
| Update Definition | POST | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/copyJobs/{id}/updateDefinition` |
| Run On Demand | POST | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/items/{id}/jobs/instances?jobType=Execute` |
| Delete Copy Job | DELETE | `https://api.fabric.microsoft.com/v1/workspaces/{wsId}/copyJobs/{id}` |

---

## Static Configuration (POC)

These values are hardcoded in `FabricCopyJobService.php` for the POC and should be made dynamic:

| Constant | Value | Source |
|----------|-------|--------|
| `WORKSPACE_ID` | `67543918-d5b0-49c5-8fe3-1fbeeb402c90` | "Prolytics Connect US DEV2" workspace URL |
| `LAKEHOUSE_ARTIFACT_ID` | `0991983e-0d16-4f94-aec2-2cddfa1702aa` | "Destination_Holiday" lakehouse URL |
| `MYSQL_HOST` | `fabricdb.cuhktiypxhzg.us-east-1.rds.amazonaws.com` | `test-aws-mysql-connection.php` |
| `MYSQL_PORT` | `3306` | `test-aws-mysql-connection.php` |
| `MYSQL_DATABASE` | `fabricdb` | `test-aws-mysql-connection.php` |
| `MYSQL_USERNAME` | `fabricusername` | `test-aws-mysql-connection.php` |
| `MYSQL_PASSWORD` | `fabricpwd` | `test-aws-mysql-connection.php` |

---

## POC Execution Log

### Successful Full Run (2026-02-11)

```
Step 1: Get Access Token         -> OK
Step 2: Create MySQL Connection  -> OK (ID: c9fa26a6-7cdf-431f-a6a8-c1ac7cbe4263)
Step 3: Build Definition         -> OK (Mode: Batch | Tables: 3)
Step 4: Create Copy Job          -> OK (ID: 4eaadf6b-ccc7-4961-a431-b0e574e18432)
Step 5: Run Copy Job             -> OK (202 Accepted)
```

**Copy Job Name:** `MySQL to OneLake Copy Job - POC 2026-02-11 03:53`
**Tables Copied:** customers, orders, products

---

## Next Steps (Post-POC)

1. **Make configuration dynamic** — Replace static constants with database/config values
2. **Field mapping** — Implement `Map to destination` column-level mapping (currently uses TabularTranslator auto-mapping)
3. **CDC incremental support** — Wire up `incrementalColumn` parameter for CDC mode tables
4. **Error handling** — Add retry logic and better error reporting for failed jobs
5. **Job monitoring** — Poll job instance status after triggering a run
6. **Connection reuse** — Store created connection IDs in the database to avoid creating duplicates
7. **Multi-workspace support** — Accept workspace ID and lakehouse ID as parameters
