Skip to main content

Legacy Data Archive

FTP Mirror Dataset

A complete, publicly accessible mirror of the Companies House legacy SFTP server. Hosted on high-performance object storage to support modern data pipelines.

The access bottleneck

While modern APIs exist for individual data records, crucial bulk datasets—including officers, mortgages, insolvencies, and new registrations—remain confined to a legacy SFTP infrastructure.

Accessing the official SFTP server requires a manual provisioning process by Companies House support, including developer forum requests and SSH key exchange. Once connected, transfers can be slow and directory structures lack comprehensive documentation. This poses a hurdle for teams building automated ingestion workflows.

Infrastructure

High-performance public storage.

We have mirrored the entire SFTP server to a public object storage bucket, providing unrestricted access to both historical files and the most recent releases.

Speed >1 Gbps

High bandwidth availability significantly reduces download times for large data releases.

Integration No auth

A public bucket means no SSH keys or rate limits to configure in your ingestion pipeline.

Cost Free

The public mirror is provided free of charge to improve data accessibility.

Available datasets

The mirror contains dozens of specialised data products. Below is a sample of the most commonly requested records currently accessible in the bucket. Use the interactive file browser for the complete catalogue of 29,000+ files organised by product and production date.

Officers Bulk File

Popular

A comprehensive record of all current and historical officers.

Code prod216
Frequency Monthly
Size 11.76 GB
Browse files

Liquidations

Daily updates for company liquidation records.

Code prod197
Frequency Daily
Size N/A
Browse files

Mortgages

Corporate mortgage records. Initial snapshot with prod201 for daily updates.

Code prod199
Frequency Daily
Size 1.82 GB
Browse files

Gazette

Official public record including London, Belfast and Edinburgh Gazettes.

Code prod202
Frequency Weekly
Size 21.33 MB
Browse files

Disqualified Directors

Weekly snapshot of disqualified officers.

Code prod192
Frequency Weekly
Size 4.71 MB
Browse files

Data format

While the delivery mechanism has been modernised, the underlying files retain their legacy structure. Most records are supplied as fixed-width text files, with variable-width fields separated by a chevron character.

Open source parser

We are currently developing a high-performance open source parser to simplify the ingestion of these files. We aim to release support for the most common formats within the next 12 months.

officer_record.txt
FC0294761C                      00030031VITAMIN- SHOP NORTHERN IRELAND<
FC0294762301149774860001        20100316                197810          0127<MARCIN<NAKONOWSKI<<<<MOORE ST. MALL 58-66 PARNELL STREET<DUBLIN<DUBLIN<CO. DUBLIN<IRELAND<DIRECTOR<POLISH<REPUBLIC OF IRELAND<

Example of the custom fixed-width and chevron-delimited format found in legacy distributions. Custom parsing logic is typically required before data can be loaded into standard databases.