I've to do a work: a recital with 3 cams and 1 audio source. I want to create a live-streaming in real-time managed by an app nested into a website. I'm asking if Java is good for my purposes. I ask your opinion about this...

This is the situation:
A)I've
1 stereo mic linked to a mac
3 ip-cams linked to a modem
3 dv-cams
B)I need
a little free-app integrated into my website that gives to everybody in internet the possibility to change in real time the video sources from the 3 I offer, but always listening to the audio source without stop it... like a surveillance software or like a virtual direction cabin (also having 60 seconds buffer! it doesn't matter. The main thing must be the synchro between the 3 videos and audio signal).

I think I've three option:
OPTION A)
3 ip video stream from the 3 ip-cams through the modem to the web
1 audio signal through the mac to the web
+
1 app nested into my website used by people to see the performance, and that gives the possibility to change the camera-view (from the 3 I offer) without stop the audio signal...all in sync, like a surveillance software or better like a virtual director cabin
OPTION B)
3 dv-cams each multiplexed in real time with the unique audio signal from the mic in sync (mic and dv-cams are linked to the mac)
and then outputting these 3 multiplexed video signals into 3 separated ip audio/video streamings
+
1 app nested into my website used by people to see the performance in the web, that gives the possibility to change the camera-view (from the 3 multiplexed signals I offer as live-streamings) like a virtual director cabin
OPTION C)
3 dv-cam into the mac
1 mic into the same mac
A software that multiplexes audio with a unique glued-3-video-sectors (a sort of double-splitscreen built with the 3 video ins):
[1]+[2]+[3] = [1|2|3]
This software must transform the obtained audio+superWS-video into a unique live-streaming into the web
+
A simple app (nested into my website) that "sees" this obtained stream, and works like a double-layer-masking software: we have 2 layers (top-layer is the mask and bottom-layer is the live-streaming), so moving left or right the bottom-streaming-layer, this app hides 2 sectors of the whole video under the opaque part of the top-mask-layer, so only 1 sector will be shown
[-|2|-] -> [1|-|-] <- [-|2|-] <- [-|-|3]
this will simulate a simple mixer video with synchronized audio.

This is the question:
which is the better option, and what kind of software do I need to do this job? (converting the ip streams, building the app that mixes ip video, nesting the app into my website)
Can Java help me, or something else?

Thanks
Adriano